Slashdot Mirror


Transmeta Code Morphing != Just In Time

Andy Armstrong has written us a pretty interesting, but somewhat technical piece of Just in Time, computer generated assembly language, profiling, and more. Its a pretty interesting little bit that ought to give you a lot to talk about.

The following was written by Slashdot Reader Andy Armstrong

Transmeta Code Morphing != Code Morphing The recent Transmeta announcment crystalised something I've been thinking about for a while. It's my belief that it should be possible to make a compiler generate much better code in the general case than someone writing hand coded assembler. Furthermore it should be possible for a JIT (Just In Time) compiler to produce better code than a conventional one time compiler.

Why should a compiler be better than an experienced assembler programmer? Well,

  1. the compiler can know the target processor intimately (cycle times, impact of instruction ordering, etc.)
  2. the compiler gets to re-write the entire program each time it sees it.

The second point is critical: any programmer writing an assy. language program of any significant size will write the code to be maintainable. Of course, it makes sense to do things like defining standard entry and exit sequences to routines, keep a few registers spare (in those architectures that have more than a few) and other practices that lead to maintainable code, but the compiler doesn't have to maintain the code it writes. It gets to write the whole thing from scratch every time. This means that functions can be inlined (and repeated code sequences turned into functions). Loops can be unrolled then rolled back up at the next compile when the programmer has decided that space is more important than speed.

If you know you're not going to have to maintain a bit of code you can do some pretty scary things to get it to perform better. A compiler can potentially do that all the time.

Why should JIT be better than one time? This should be clear to anyone who's followed the Transmeta story. A key element of their code morphing technology is that they insert instrumentation into the code they generate, effectively profiling as it runs so that the compiler can decide which bits to optimize the next time it sees the code. It's well known that the coverage graph for a typical programme looks extremely spiky - the most frequently executed code may get hit thousands or millions of times more than it's neighbours. It follows that it really isn't worth optimizing the stuff that only accounts for a millionth of the code's execution time.

This brings me to my point: is there really any reason why a Java / JIT combination shouldn't result in code that executes as quickly as the equivalents in other languages?

You might suggest that garbage collection must slow things down, but I'm convinced that, done right, garbage collection can actually improve performance. malloc()/free() require the memory manager to think about the heap for every call, but new() can be implemented as a stack (m = memlimit; memlimit += size; return m) and the garbage collector gets to do all its memory management in one chunk - it can take an overview of the memory landscape rather than trying to keep things fairly optimal each time memory is released as free() must.

You could argue that the OO nature of Java means that it must dynamically allocate objects that would be static in a program written in C or assembler. That's true, but (assuming calls to new() are cheap, which I believe they can be) this really isn't a problem. Current processors don't take a huge performance hit when working with objects who's address is not known at compile time; in fact in many architectures it makes no difference at all.

So while it might seem profoundly counter-intuitive can anyone actually give me a good reason why Java + JIT should be slower than Good Programmertm + Assembler?

454 comments

  1. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

    So, yes, an assembly programmer can outrun most any compiler. But does it matter? Almost never.

    Depends on the application.

    Nobody here is talking about the fact that big piggy compilers use lots of resources when you're talking about very resource-tight applications. I've written a fair bit of embedded Assembly code for 4 and 8 bit single chip designs where there was 128 or 256 bytes or maybe even 1K of read/write memory in total for the whole system. Guess what: that means you don't get to have a huge stack.

    The only use for C in such an environment is to steal math routines out of it by compiling to ASM and reverse engineering them.

    And there wasn't even any C available for the NEC 4-bitter I spent so much time coding.

  2. Re:No. Hand coded assembler is faster. by Anonymous Coward · · Score: 0
    Humans are good at optimizing small isolated sections of code. And with with much, much diligence can occasionally beat the compiler. However, in the more general case the good compiler will beat the good human, unless of course you consider a borg-like automotan a human (and not a compiler).

    The reasons - (1)humans get bored and lazy. They can't do even fairly basic optimizations over large programs consistently. So sure sometimes a human may do better for a small stint, but we are talking whole real world programs like an OS, database engine or Mozilla.

    (2) Humans are error prone. Even good clueful ones. That means that programming in asm introduce bugs, and at times means screwing up the optimization but still generating correct output. Less buggy is better and higher level will yield less bugs.

    (3) Some optimizations just are not tractible for humans to do in practice. Sure I can sort an a list of 2500 names, sorting is not hard, but on past a certain scales it just is not practical to get humans to do it. Computers scale very well for many types of problems much better than people, and many of those types of problems are useful for generating faster/better code.

    Slothmonster

  3. Re:Profoundly counterintuitive? by redhog · · Score: 2
    I will only comment on one of your point, while you are, in my opinion, right, and insightfull (You are insightfull even on this one, but not totally right):
    When given an environment like Java which doesn't allow pointers and other antiquated memory hniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.
    The point is that the average application has inflated enormeous over the years. Thanks to stupid programmers, abstraction and cool languages. You gave the key to this yourself "on all but the most simple of applications"...
    Not that a good compiler cannot outperform even a good assembly hacker on, for instance, memmory allocation. And not that abstraction in itself must lead to inneficiency, but it makes it inherently easy to allocate/free memory for the programmer, who will then of course allocate/free more memmory.
    --The knowledge that you are an idiot, is what distinguishes you from one.
    --
    --The knowledge that you are an idiot, is what distinguishes you from one.
  4. Re:Compilers dont write better code than humans by interiot · · Score: 1
    There are still many other things that the programmer knows but isn't able to express using today's languages (or wouldn't want to).

    For instance, a line in a program might be "a = b % 255". But the 255 might not be set in stone, it could differ by up to 10. Computer languages can't express this, but if they could, the computer could figure out that it could change the 255 to a 256 (here and possibly elsewhere in the program) and make it "a = b & 0xFF" which is much faster.

    Granted, this is another trivial example, but there are many more situations like this where optimizations are possible. The more we tell the computer, the more it can figure out. The boundaries of what programming languages can express and what optimizers can do are expanding all the time, but until we tell it every nuance about the program, a human will still be able to use that knowledge to make faster programs.

  5. Re:Evolve code. by Geotrash · · Score: 2

    Other engineers are required to use the minimum resources to achieve the maximum requirements of the project design, shouldn't computer 'engineers' be able to acheive the same.

    Apples and oranges. The cost of creation materials for say a building, demands that the least amount of them be used. Not so with bits and bytes. These days, they're almost free. I applaud the true craftsmen of the programming world for their principle and talents, but with all due respect, to me it's penny wise, pound foolish to waste time reinventing the wheel each time you have to write code. Did you dig your home's foundation with a trowel?

  6. Re:Profoundly counterintuitive? by Chuan-kai+Lin · · Score: 1

    Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?

    I haven't done that myself, but some other people had. The FFTW people have created a DFT library written in ANSI C that outperforms nearly all other DFT libraries, many of which are provided by the hardware vendors who hire the best programmers and use assembly code extensively. Most of the code in the library are generated by a program written in OCaml.

    And judging from the number of patents and papers in IEEE journals, I believe that creating a fast DFT library is indeed a complex and creative problem.

  7. Re:Java Byte Code by BrianJAckermann · · Score: 1

    Java really doesn't 'require' one to have a browser, even. It requires a JIT or like concept to run the code (VM's included...). More and more, Java use will likely split into split camps. It is trivial, these days, to actually create an executable native code (machine code) java program. Modern compilers are beginning to include options for this at an increasing rate. I KNOW from experience that Metrowerks codewarrior tools give one this ability. On the other side, that same code can still be compiled into Java BC, and run with VM and JIT and the like. So, this is changing rapidly, and in the future we should see even more changes to bring Java more into the 'main stream' of programming, particularily when all the assorted threading and timing and other difficulties are sorted out... Brian

  8. Re:within X% of the performance of native code by Anonymous Coward · · Score: 0

    No, it's not an alpha or a k7

    I am wondering if it's even a K6. They certainly spent enough time at the dog and pony show preaching about "the benchmarks need to be adapted" instead of quoting performance numbers.

    It's almost certainly as good as a 486sx, though. Faster, at least.

  9. Re:Profoundly counterintuitive? by binkley · · Score: 1

    Although I agree in general with your comments, your 4th point is incorrect. Just as with C++, the Java compiler is often able to detect when a method call does not need to be virtualized, and does not need to go through indirection.

    This is especially so for two important cases: inlining function calls, and finalized classes (where the programmer instructs the compiler that there will be no subclassing).

    --binkley

    --
    --binkley
  10. Re:Profoundly counterintuitive? by Kvort · · Score: 1

    Have you ever written a large program? Even on a command line? On a CLI you can do assembler or C, because they're linear in nature. C/C++ was not created so that programs could use less CPU cycles; it was created so that programs could be written more quickly (programmer time), more robustly, and vastly larger than their assembly counterparts.

    I'm certain you can write a 500-instruction assembly program that will run faster than a similar program I write in C++. But the scope of problems you can solve in those 500 lines are so unbelievably small that its not worth the time to test it.

    >>>>>>>>>> Kvort

    --
    -Don't mind me, I'm personality-deficient and mentally-impaired.
  11. Re:Evolve code. by DQuinn · · Score: 4

    Sorry... this post is probably meant to be deeper inside this thread.

    Are we not all just dancing around the idea of a human compiler with a vast database of algorithms, intuition, look-ahead properties, and an abstracted idea of what it is exactly that the program is doing? IE. There is no compiler, to my knowledge, which writes highly effective parallel code. Why is that? Simply because no compiler can really understand the overall problem.

    How long do you think it will be before the all-powerful AI compiler is written, with all of these abilities, and more, on a box that runs at a few GHz?

    Or maybe i'm just being a little too science fictiony here :)

    Cheers,
    D

    --
    os.system("perl -e 'print \"My first Python Script.\"'")
  12. See "Engineering a Compiler" by vaxheadroom · · Score: 1

    You might want to read "Engineering a Compiler" by Patty Anklam, Dave Cutler, Roger Heinen, and Doug MacLaren, from Digital (as in DEC) Press, ISBN 0-932376-19-3. (Dave Cutler was one of the architects of VMS and, eventually, Windows/NT.) This is the team that wrote the Dec VMS compilers (PL/1 and Vax-C). If I remember correctly, their back-end code generator was used for several (most? all?) other Dec VMS compilers. Dave's thesis was that a good compiler could outperform all but the best assembler programmers (and Dave was an awesome assembler programmer on the PDP-11 and Vax architectures). If I remember the book correctly (it's at home), their idea was to generate straight-forward assembler code and pipeline-optimize everything by detecting and rewriting assembler code sequences.

  13. Re:The "C is not portable" myth by arivanov · · Score: 2
    Take a look at a few of the most widely ported software packages around. How many of these redefine the types in order to "know the size of their types"? Not many, I can tell you.

    Quite a few. Amazingly these are usually those that have portability problems. PGP 5 is a good example.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  14. Re:Compiled Once vs. Compiled multiple times by mangino · · Score: 2

    What if you compiled the original code with performance analysis included, and then fed that performance analysis back into a traditional compiler to execute again. In this way, you get all of the benefits of the above JIT system without the running overhead of the JIT. Sure, it takes a few thousand runs to get the program optimized, and you can only optimize for the average case, but most programs that need to be heavily optimized will have a roughly similar executaion pattern every time.

    I may be wrong, but you can do this with the SUNWspro workshop compilers. This technology would be nice to have in gcc as well.

    Mike
    --
    Mike Mangino Consultant, Analysts International

    --
    Mike Mangino
    mmangino@acm.org
  15. What are you saying? by Improv · · Score: 1

    I hope you don't mean that you think a "good"
    assembler will produce fast but incorrect code..

    --
    For every problem, there is at least one solution that is simple, neat, and wrong.
  16. Re:Transmeta Speed by mohaine · · Score: 1

    Of course, that is why you could do it in a seperate (but identical) CPU without any problems. The code optimizations could be much better then what you could get out of a hardware only solution.

    --
    (appended to the end of comments you post, 120 chars)
  17. Re:Bless you! by Anonymous Coward · · Score: 0
    That funky GC is not only easier to use than manual memory allocation, is bugfree unlike manual memory allocation, but it also faster than manual memory allocaiton. Go to read the white papers on HotSpot. I exists today and is faster today.

    As for RTTI, it allows you to do things that you cannot do without it. Things like type saftey, and runtime loading of classes. More importantly it make formal analysis of Java programs possible, unlike C/C++ programs. In the future those tools will be critical - because they will be the key to producing near bug free code. Being able to state with supreme confidence that this type of bug does not exist in my code because it has been rigorously proven is helpful. You really should be more concerned with running correct code than running "fast" code.

    Now if you want to design a better byte code format for a Java like language feel free. Researchers all always looking to improve those sorts of things. Java uses a stack based machine - this is good in that makes .class files small. What other (dis)advantages are you aware of?

    Message to the C/C++ world - you spend too much time writing software to waste time on a non type safe langauge.

  18. Re:Compiled Once vs. Compiled multiple times by JWRose · · Score: 1
    The problem I see with caching is that the cache is only available for so long. Let's say you have a program(program X) that is only intended to run 1 a week, 1 a month, or even 1 a day. In between each run, countless numbers of other programs could have run using up the cache. The next time program X runs, it is no longer cached and has to be recompiled before execution. Whereas, had program X been conventially compiled originally, it wouldn't make any difference how frequently the program runs. Granted, I will concede that for programs that are run frequently, the process you and others have described, might produce faster code.


    Nothing exists exept atoms and empty space; everything else is opinion.

    --

    blah blah blah....
  19. Re:Profoundly counterintuitive? by Hard_Code · · Score: 2

    I think you may be missing (or misunderstanding) the point a bit. Sure, you can furbish an example, or a language, in which a human can optimize the program /statically/ better than a compiler/computer can optimize a program /statically/. But the author makes the point that the most frequently executed parts of the program are very sparse, and that the cost of profiling the program dynamically can be recouped by doing intelligent dynamic optimizations. As hardware gets faster and more complex I think this will be more true. There will simply be no way for humans to statically optimize for such a non-deterministic runtime environment. For example, how many programmers explicitly specify parallelism in their programs? How can they be sure that the CPU at runtime doesn't know better than them how to optimize the program? I say let the CPU decide. There is going to come a point in which it is just not feasible for humans to even attempt to optimize for such a complex runtime environment, and that time is coming fast. The faster chips get, the smaller the more amortized the overhead of finding better optimizations at runtime becomes.

    Jazilla.org - the Java Mozilla

    --

    It's 10 PM. Do you know if you're un-American?
  20. Re:Compilers dont write better code than humans by Junks+Jerzey · · Score: 3

    You're right, of course. No matter what some people have been trying to say for 40 years or more, you can almost always write better code than a compiler. This is especially true when you're dealing with something bigger than a a single function. The usual way of showing that a compiler is better than a human is by using one smallish C function as an example. That's a pointless example, because the benefit comes from analyzing that function in context and not on its own.

    The point of diminishing returns comes into play quickly, however. For example, take the renderer for most any fast 3D game. If you went into the routine that passes triangles to the graphics card, a frequent hot spot, and added a pointless call to a supposedly expensive function, like sqrt, you're not going to notice an effect on frame rate. Doing a higher level optimization, like removing a single polygon from a model, is going to be more of a benefit than optimizing the polygon code, but it's still not going to be noticible. Sometimes you can get big benefits by using algorithms that look more complicated, ones you wouldn't want to approach in assembly, even though they use more code.

    With complex programs, it's even conceivable for an interpreted language to out run a compiled one, because everything comes down to architecture and an understanding of the problem. This is hasn't been the case with Java, because Java is a fairly low level language (the more abstract a language is, the less win from compliation) and because Java has become entrenched in the "learn programming in 14 days" market of web designers turned programmers.

    So, yes, an assembly programmer can outrun most any compiler. But does it matter? Almost never.

  21. 6502 by tilleyrw · · Score: 1
    I shapened my teeth on a 6502 when I first began eating assembly code. It is a very cool processor.


    Just wanted add this to the discussion.

    --
    This post encoded with ROT26. If you can read it, you've violated the DMCA. Handcuffs please, sergeant.
    1. Re:6502 by Anonymous Coward · · Score: 0

      You like 6502? Check this out. 6502 to x86 JIT compiler.

  22. Why do "dynamic" single platform compiling by GrEp · · Score: 1

    This my be a little off the thread of thought here, but why aren't there any one-time "dynamic" compilers. What I mean by this is a program that monitors watches another program run and then gives a detailed report of where the preformance problems in the code, which would be used for optimization.
    Example: Load up your dynamic compiler and have it complile a program, lets say Quake. After comiling it just like gcc Quake would then load. The user then play Quake for a while to demonstrate normal usage; meanwhile the "dynamic" compiler would be taking notes in the background as to where 90% of the cpu time is spent. After the user is done playing Quake the "dynamic" compiler would look at the notes it just made and optimize those parts of the code that are a preformance bottleneck.

    This would be a much better approach than optimizing blindly with no indication of where the optimization needs to be.

    Although pulling out your BFG might be sped up by 30ns unrolling a couple loops it wouldn't be worth the extra size of the program. However a little loop used in rendering the background unrolled might make a huge preformence boost, that would more than justify the extra program size. How would a "static" compiler know this? It wouldn't because it doesn't know how the code is used in a real world situtaion.

    So:
    1. Are there any similar optimizing compilers out there? And where can I get my hands on one?

    2. If there isn't one there should be an intiative to make one.

    --

    bash-2.04$
    bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
  23. Don't rely on the Compiler to Optimize by ArcadeNut · · Score: 1

    Optimization at the level that the compiler works on, is generally a last step for optimization. I agree that a compiler can do that type of opimization better than most people. However, to truly optimize your app/function the compiler will fail simply because it has no clue as to what the function/app is supposed to be doing.


    You will get a bigger bang for your buck, optimization wise, by changing the implementation rather than optimzing CPU cycles.


    A really good book on this subject is called "The ZEN of Code Optimization" by Michael Abrash. The book covers almost every aspect of optimization including things like knowing when to optimize and when not to. Its geared towards the Pentium processor and has most examples in C and Assembler.


    Another problem that you could run into with the compiler optimizing for you is that it might change the intent of the function/app. If it does that then you will be hard pressed to debug your app unless you enjoy tracing through the assembler. Most if not all compilers don't have this problem anymore.



    If you want it to go faster, change the way your doing it.

    --
    Visit the Arcade Restoration Workshop @ http://www.arcaderestoration.com
    1. Re:Don't rely on the Compiler to Optimize by PigleT · · Score: 2

      I think I agree... :)

      There is indeed an art - if not a Zen - and a science to knowing when to optimize stuff, and I think it boils down to 'use the hottest algorithm around' (especially if you start in C), *followed* by 'hack the ASM by hand'.

      Something I finally got round to doing just this past weekend, that I'd been meaning to do for a while, is it compare the speed of execution of loops that simply count between 0 and a big number, but using different comparison methods, and different compiler optimization levels in gcc, with Perl for the hell of it as well.

      The full source I used follows:

      #include
      #include
      #include

      #define INNER 5000000
      #define OUTER 200

      void up(void), down(void), xor(void);

      int
      main (void)
      {
      int c;
      time_t t0, t1, t2, t3;

      t0 = time (NULL);

      for (c = 0; c Mosix cluster, which was "interesting", to say the least... ;)

      As for perl, well that behaved the way I expected. The whole thing was way slower than C, but it did at least do an xor when I asked for it - that was almost invariably faster than counting up with "".

      Well, those be the results... compile the source yourselves with all the various gcc options and compare the assembler, folks :)

      --
      ~Tim
      --
      .|` Clouds cross the black moonlight,
      Rushing on down to the circle of the turn
    2. Re:Don't rely on the Compiler to Optimize by PigleT · · Score: 2

      Aaaargh!
      CmdrTaco, your preview mode is broken. That worked for me before.

      In any case, the source is available here.

      --
      ~Tim
      --
      .|` Clouds cross the black moonlight,
      Rushing on down to the circle of the turn
  24. Why not do "dynamic" single platform compiling? by GrEp · · Score: 1

    This my be a little off the thread of thought here, but why aren't there any one-time "dynamic" compilers. What I mean by this is a program that monitors watches another program run and then gives a detailed report of where the preformance problems in the code, which would be used for optimization.

    Example: Load up your dynamic compiler and have it complile a program, lets say Quake. After comiling it just like gcc Quake would then load. The user then play Quake for a while to demonstrate normal usage; meanwhile the "dynamic" compiler would be taking notes in the background as to where 90% of the cpu time is spent. After the user is done playing Quake the "dynamic" compiler would look at the notes it just made and optimize those parts of the code that are a preformance bottleneck.

    This would be a much better approach than optimizing blindly with no indication of where the optimization needs to be.

    Although pulling out your BFG might be sped up by 30ns unrolling a couple loops it wouldn't be worth the extra size of the program. However a little loop used in rendering the background unrolled might make a huge preformence boost, that would more than justify the extra program size. How would a "static" compiler know this? It wouldn't because it doesn't know how the code is used in a real world situtaion.

    So:
    1. Are there any similar optimizing compilers out there? And where can I get my hands on one?

    2. If there isn't one there should be an intiative to make one.

    --

    bash-2.04$
    bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
  25. Re:Only when you can rewrite the whole thing by rmstar · · Score: 1

    Part of the argument was in fact compilers beeing better at writing fast code than assy. programmers. I HAD to say something :)

    Anyway, usualy normal portable code does the trick if you are not running realy time critical stuff. And if you do, then no compiler will.

    We are at a point in this industry where most advances won't matter to almost anyone that would spend a thought on it. There is no difference (to the vast mayority of users) between a task taking a nanosecond or a femtosecond. Or whatever.

    rmstar

  26. Re:Ooo, ooo, start on packing! by ucblockhead · · Score: 2

    Would you also send your pointers across a network and expect them to work at the other end?

    The idiots who wrote the code that caused me the pain would have...

    This involves one of the internal debates I've had about C for years. C allows you to shoot yourself in the foot. That is all well and good as I (I hope) know what I'm doing. The trouble is that, with turnover in this industry what it is, a new foot might have arrived by the time the bullet hits. (Unfortunately, I've often been the new foot.)

    --
    The cake is a pie
  27. Re:Match code frag. as opt. tech. (Was:Evolve code by slashdot-terminal · · Score: 2

    1.Please give an example on the type of code you'd want to optimize, any reasonably simple (small fragment of )ecode that doesn't depend on advanced integer arithmetic should compile to something as fast or faster than you
    would possibly write yourself... Please include example.c or .cpp, GCC input, GCC assembler output and your "better" ASM output.


    I was talking about a simple brute force compare and if it dosn't work then move on type thing. I think that this type of thing will always work (just take a little while). Considering that most modern computers are extremely powerful then I have to say that such brute force is still possible. What I don't get and still don't get is how people get all the time to actually research all this stuff. Does anyone have a functional life anymore? I am took courses for an associate of science and they didn't get anywhere as complex as all the little theory arguments as I have seen here.

    1.Invartiant detection, using compilers to find invariants in the code, and then deduce lemma's which may remove bounds-checks, type-lookups, casting or flush of registers to memory.

    Could someone please explain what a lemma is I'm sorry I guess I didn't take Phd level math courses.

    --
    Slashdot social engineering at it's finest
  28. Re:Experience from Optimizing Java ... by radish · · Score: 1


    Actually, some industrial strength JVM's (i.e. not M$ or browser based ones) run with a preset memory allocation. For instance, we run the sun JVM on Solaris with an allocation of, say, 512mb. That means it's running it's own memory allocation within that space, which it doesn't have to share with any other processes. So of course your argument about it having to malloc() twice is avoided. In fact (and my knowledge of memory allocation models is not good enough to be sure) it kind of suggests to me that because the JVM is allocatin within a "safe" environment it may be able to cut corners and actually do the malloc faster than native OS calls.

    --

    ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

  29. Re:Evolve code. by mistabobdobalina · · Score: 1

    ho ironic would it be to see engineers getting replaced by machines...i could just see a protest outside microsoft by unemployed computer profesionals!!!

    --
    -- your knees hurt, don't they?
  30. Evolve code. by Anonymous Coward · · Score: 3

    "In the general case", perhaps a compiler will produce better code (and certainlty not, as yet, for specialised cases - an inner loop coded direct to PPC macro assembler (PASM on Amiga PPC) is the fastest code I've ever written, but humans still write compilers. Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write, but it'll probably depend on some weird side effect to some obscure instruction, or the rtesosnant frequency of your ram bus, or something...

    Cool article, all the same though. I'd love to see more of this sort of thing on Slashdot, like back in the old days.

    1. Re:Evolve code. by Pentagram · · Score: 1

      >Cool article, all the same though. I'd love to see more of this sort of thing on Slashdot, like back in the old days.

      I second that! Nice article.
      open angle bracket-slash-AOL-close angle bracket

      A bit technical, says CmdrTaco. Wow! I understood every word. Perhaps those lectures are starting to filter in at last. Or general Slashdot osmosis.

      It amazes me how much assembly still gets written. Except for a few obscure applications we should have the technology to be able to abstract the assembly out by now.

    2. Re:Evolve code. by Anonymous Coward · · Score: 0

      Here in CBORG land we believe that the best and fastest code is written by greasy faced C programmers high on Mountain Dew at 2:30 in the morning. If you have a problem with that it may indicate that your implant is loose or malfunctioning. Please report to central maintenance immediately.

    3. Re:Evolve code. by Anonymous Coward · · Score: 0

      If you'd like to use programs that execute in 2K you can probably still find them along with an OS to run them on. On the other hand, if you'd like to run modern applications with all the bells and whistles, then you'll have to cough up a little more memory. Your choice!

    4. Re:Evolve code. by gfxguy · · Score: 2
      There's also lot's of valid reasons for code bloat. Well, maybe not real "bloat", which would imply lot's of frivalous code. What I mean is that there are lot's of valid reasons why programs are growing ever larger.

      The use of libraries, even for a small number of functions, increases code size (thus the 20k "Hello World" program). However, if we proceed to enhance the "Hello World" program to do a lot more, including input and output and simple processing - the code does not grow proportionately. In other words, a program 100 times bigger and much more useful than "Hello World" may only take up twice as much space.

      Code size increases today are mainly due to interface issues. Very little interface code in Word Perfect 5.0 for DOS compared to Word Perfect 2000. It's a necessary part of the modern program.

      I just don't think it's crappy compilers, or necessarily crappy code writing, that's to blame. It's feature bloat. And like anything else, as it becomes larger it becomes harder to steer.

      The fact is, though, wether we need the features or not, programs today do a lot more than they used to. Word Processors give you many more format options, fonts, merge features, etc., then they ever did. Add the big GUI code (90% of most of my programs ends up being GUI stuff these days), and you get a modern "bloated" program.

      I hate to say it, and I write command line utilities as much as possible, but when your target audience expects a visual interface, it adds a LOT of overhead to what might otherwise be a simple program. How big is "Hello World" compiled using Xtoolkit or Motif?!? Not to mention the libraries that may need to be loaded at execution time, making the perceived size even larger.

      I'm not excusing poor programming, and I certainly have a dislike of "Visual" programming tools for my own reasons, but code growth is a necessary evil.


      ----------

      --
      Stupid sexy Flanders.
    5. Re:Evolve code. by mubes · · Score: 1
      On the subject of evolving code, try a look at;

      http://www.forces.org.uk/activ/ws9911.html

      Scroll down to the work by Peter Martin. ...I'm sure there's other work being done in the area, but that's the first I've seen with 'real' application.
    6. Re:Evolve code. by Anonymous Coward · · Score: 0

      Why do you think that everyone and their mother is going to those "schools" that teach people how to do network admining and such?

      Because "operator" positions (to use the old term for differentiating between drones and creative people) pay good enough, while allowing the individual to be a stupid beer drinker/dope smoker who watches tee vee every night after putting in 8 hours.

      Lots of people just want an "operator" position (i.e. sysadmin, cable puller, CNE, MSCE, Helpdeskjockey, etc.) because they're not that interested in creative work (i.e. software engineer, EE, various R&D type jobs.)

    7. Re:Evolve code. by AxelBoldt · · Score: 1
      Do you really want to be out of a job because the computer can just write that new app by itself?

      We will all be out of a job eventually. That's the whole point of technology: machines doing the work for us. We scientists will go last, but we too will go. The only question is: can we design a decent welfare state quick enough for the time when retirement begins at birth.

      --

    8. Re:Evolve code. by CFN · · Score: 1

      I have to disagree.
      Just search monster.com and see how many jobs there are for ASM programmers.
      Some people will always be needed for special ASM work, but today's software is way too complex to be coded in ASM and today's hardware is way to complex for a human to code.

    9. Re:Evolve code. by bgood · · Score: 1

      Oh, I think we've got to give people and systems more credit. There are many systems that are incredibly complex, yet the human mind both imagined and conjured them. I don't think this concept is anymore exotic than that of genetic algorithms (which, when first introduced, seemed pretty farfetched). Of course, JIT coding probably only optimally applies to certain problems.
      And, it seems that there should be no fear of losing work to JIT systems. There's always new problems to solve and old bugs to fix.

    10. Re:Evolve code. by Alton · · Score: 1
      Do you really want to be out of a job because the computer can just write that new app by itself?

      First, the majority of apps that people are actually making money on don't require high speed. 99% of the apps I write could probably be made 20-50 percent faster, but the end users wouldn't be able to tell the difference and wouldn't care.

      Second, not only would the computer have to write the program, but it would have to gather the requirements from the users and translate that into something understandable by the programmer, either human or machine. It has been a long time since I wrote a seriously difficult algorythm. More of my time is spend gathering requirements and figuring out how different end user requirements merge together. This is something that compilers and computers in general are a LONG way from being able to do efficiently and effectively.

      We created C++ and Java for the reasons of creating better code and creating maintainable programs. We must really look objectively and say is creating stuff for lazy people really worth the effort.

      The answer to your question is yes. The people who use C++ and Java and VB and PowerBuilder (like myself) are not always lazy. The code I generate brings my company in about $100,000 revenue (after my salary) each year. The more coding I get done, the more revenue I generate, and, as I said before, the end users cannot tell the difference in speed. The applications wait on the users MUCH more than the users wait on the application. I spend more time optimizing critical areas, and often times, areas that are not critical get no optimization, and most of the time the user will neither care nor notice. C++ and Java are not designed for "lazy" people. They are designed for people who need to accomplish a task quickly.

      I'm not going to be out of a job ANY time soon because someone writes an AI C coder/compiler.

      --
      "Anyone who can't laugh at himself is not taking life seriously enough." - Larry Wall
    11. Re:Evolve code. by lollypop_man · · Score: 1

      What's wrong with just getting people to actually do the design work themselves. We created C++ and Java for the reasons of creating better code and creating maintainable programs. We must really look objectively and say is creating stuff for lazy people really worth the effort.

      I always thought that lazyness was an essential part of programming, because if we weren't lazy, we wouldn't spend our time trying to find easier and shorter ways to implement our algorithms.

      And anyway, if we get a computer generated Compiler, it's still just a compiler. It still needs source to compile, so programmers will still be needed. We just won't need many ASM programmers. Arrghhhh! I'll just have to Assemble for fun instead then!

      --
      Quote from the mothership:- If programmers are inherently lazy,then I must be the greatest programmer ever!
    12. Re:Evolve code. by Alex+Belits · · Score: 1

      People who put down sysadmins should be forced to wake up at 3 a.m. to fix an outage every couple of weeks until they retract their statement. If they never retract it, good, more sleep for everyone else. Hrmph.

      Please, look, at the comment, to which I was answering -- exact opposite of what you are referring to.

      --
      Contrary to the popular belief, there indeed is no God.
    13. Re:Evolve code. by Anonymous Coward · · Score: 0

      again, like an earlier poster said, the differences are for the GUI/visual stuff.

      Word is pretty quick in non-"page layout" mode. It used to have an even rougher view mode, but that has gone by the wayside.

      Wordstar seems faster simply because it doesn't have to do that visual rendering all the time. As far as typing lag, I've never had a problem overtyping a decent PC, and I'm a pretty fast typer. What do you type, about 200 WPM or something insane like that?

      As far as missing functionality in Wordstar...how about pulling data from a database for a mail merge, via some reasonable, quasi-open interface standard like ODBC...

      However... the launch time for Word2000 on my lowly P200MMX blows. It is definitely slower than Word97 (which was slower than Word6, which was slower than Word2).

      Will feature bloat ever end?

    14. Re:Evolve code. by Anonymous Coward · · Score: 0

      With all the talk about code Evolving, I'll be buying stocks in SkyNet the second they're offered. However, I'll also be cashing them in just as soon as the run up stops--knowing that the world will end soon afterwards.

    15. Re:Evolve code. by lollypop_man · · Score: 1

      LOL

      They'll probably get monsato to test it here in the UK first!

      --
      Quote from the mothership:- If programmers are inherently lazy,then I must be the greatest programmer ever!
    16. Re:Evolve code. by Anonymous Coward · · Score: 0
      weird side effect to some obscure instruction, or the rtesosnant frequency of your ram bus

      New Scientist recently (last year, not online) did an article on a team who were solving algorithms with neural networks made out of a FPGA chips.

      In some extreme cases, they found that the circuits developed were using using inexplicible physical side effects.

    17. Re:Evolve code. by cranq · · Score: 1

      In general, developers sacrifice time and space efficiency to get faster development.

      It happens on a small scale when you get something going that's "good enough" and you don't want to reimplmement to make it more efficient.

      It happens on a large scale when you use prebuilt components that have more functionality than you need, but hey, they're there and they get the job done.

      It happens on a really large scale when people hatch Bad Plans and they become standards. Example: the horrid MS DLL plan. Designed to make apps smaller, but the app installers can't guarantee that the target machine has the latest DLLs, so it has to carry around all the DLLs that the app uses, even though the app uses only a fraction of the code in those DLLs.



      Regards, your friendly neighbourhood cranq

      --
      Regards, your friendly neighbourhood cranq
    18. Re:Evolve code. by JWRose · · Score: 1
      Ahh, they may not advertise for it, for specialized things, it would be desired. ASM, as far as I know, has/is being used for specialized things. Of course, ASM is frowned upon since, by it's nature, is not portable. Yet, where speed is of the utmost importance, ASM is still being used. Hell, I've even seen ASM code on the Mainframe. ASM may not be in the lime light, but that doesn't mean it's not being used or is not desired.

      Nothing exists exept atoms and empty space; everything else is opinion.

      --

      blah blah blah....
    19. Re:Evolve code. by Anonymous Coward · · Score: 3
      For deeply pipelined, agressively superscalar architectures, I fail to believe that a human being could possibly solve the difficult scheduling and graph-theoretical problems that optimal instruction ordering requires better than a mechanistic compiler. Current architectures are simply too complex and subtle for a project of non-trivial size to be more efficiently coded by humans at such a low level.

      Nevermind of course, that for real prejects, the trade off in maintainability, portability, and extensibility is almost not worth any demonstrable performance gain anyway.

    20. Re:Evolve code. by Anonymous Coward · · Score: 0

      What does buying shares have to do with SkyNet? While you are at it, you might as well buy shares in AirNet Commercial Australia. I know I will.

    21. Re:Evolve code. by Anonymous Coward · · Score: 0
      How about this one? Lots of people want to be "Operators" because they get to spend time with their family, and don't get paged at three in the morning or during their daughters 8th birthday party just to come in to work to fix a a bug they introduced because they were just stupid?

      We also have hobbies that don't include computers, or computer games, and usually involve more than 4 people.

      Dumb twit.

    22. Re:Evolve code. by Anonymous Coward · · Score: 0

      Well... on midnight, August 8th, the central SkyNet computer, using code evolving technology, becomes self aware. In the panic, the humans try to pull the plug. SkyNet retalliates by launching Russia's nukes at its enemies here... etc., etc.

    23. Re:Evolve code. by Bryan+Andersen · · Score: 2

      Nevermind of course, that for real prejects, the trade off in maintainability, portability, and extensibility is almost not worth any demonstrable performance gain anyway.

      Some people just have no idea how true this is, and I've had to work with many like that.

    24. Re:Evolve code. by norton_I · · Score: 1

      That is simply untrue. The cost of cycles is not 0. Consider that it takes longer to launch word 95 on a PII XYZ (whatever they are at these days) than it took to launch wordstar 5 on a 286. It also takes longer to load a file, save a file, reformat a file, and they typing lag is worse. And I will let you in on a little secret: There is almost no functionality missing in Wordstar -- just some of it is harder to get to (but faster to use once you know how!)

      Now, these days, people (evidentally) accept the increased latency in exchange for the ease of use of Word, but my point is, it isn't free. It isn't even cheap.

      Memory and disk are essentially free, but their bandwidth is not, and usually large files == long load times.

    25. Re:Evolve code. by John+Allsup · · Score: 1

      It's a Terminator reference -- SkyNet is the 'evil AI system'
      John

      --
      John_Chalisque
    26. Re:Evolve code. by ehack · · Score: 1

      Evolvable Hardware has been done at a variety of places, with FPGAs. Look up Adrian Thompson and Hugo de Garis and Koza... I myself have evolved neural nets, a lot of people do this sort of stuff. Evolving Software is fun too, see John Koza's Genetic Programming books. Compiler optimization by evolution (the fastest reordering or register allocation survives) is probably feasible at present, if you have an instrumented processor.

      --
      This is not a signature.
    27. Re:Evolve code. by JWRose · · Score: 1
      1. We just won't need many ASM programmers.


      ----

      Why is it that everytime we either talk about a new technology or actually have a new technology people start talking about how we will no longer need the old?

      COBOL is an old technology. Been around for quite a while now. Yet Pascal, C/C++, BASIC, and countless others have come along since then, yet COBOL is still used just as much today, if not more, than when it was first introduced.


      Just because a new technology comes around, that does not make it the best for all possible scenerios. For every problem, there is usually one tool that is more suitable to solving that problem than any other and no tool is the best tool for every problem . Therefore, Assembler/C/C++/etc programmers will be in just as much demand in the future as they have been in the past, no matter how advanced the compilers become. After all, somebody has to be able to code the compilers???

      Nothing exists exept atoms and empty space; everything else is opinion.

      --

      blah blah blah....
    28. Re:Evolve code. by Alex+Belits · · Score: 2

      How about this one? Lots of people want to be "Operators" because they get to spend time with their family, and don't get paged at three in the morning or during their daughters 8th birthday party just to come in to work to fix a a bug they introduced because they were just stupid? We also have hobbies that don't include computers, or computer games, and usually involve more than 4 people.

      And why should anyone other than you care? Your "hobbies" most likely contribute nothing to lives of others, yet the price, society pays for supporting overpaid drones becomes rather noticeable. Why should I care, how wide was someone's smile at his daughter's birthday party if that person was responsible for network (or phone, or power) outage?

      --
      Contrary to the popular belief, there indeed is no God.
    29. Re:Evolve code. by Anonymous Coward · · Score: 0


      The idea of evolvign code is vey similar to the idea proposed by Franz in one of his papers. He suggested that spare CPU cycles be used to apply aggressive optimisation of code *all-the-time*. The background optimizer would try making a new version of a function, and if it was faster or smaller (but not slower) it would cache the new version back into the file. This required storing the parse tree along with the binary code.

      The advantage of this is that you can create very SLOW optimisations which require a lot of processing without making the user wait for it o finish; they can run the roughed up version until its been optimised.

      AFAIK neither HotSpot nor Self environments take advantage of this, especially since they do not cache the executable code between sessions, which is essential to the whole plan, otherwise you'd have to leave your Java program running for a long time, and even then only it would become faster.


    30. Re:Evolve code. by benedict · · Score: 1

      If I attached my reply to the wrong post, I apologize. My sentiment stands, of course. ;-)

      --
      Ben "You have your mind on computers, it seems."
    31. Re:Evolve code. by benedict · · Score: 1

      I hope you don't work with real sysadmins, because 1) that would excuse your ignorance and 2) people who say things like that around real sysadmins often regret it later.

      --
      Ben "You have your mind on computers, it seems."
    32. Re:Evolve code. by Anonymous Coward · · Score: 3

      Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write

      Of course, you'd have to put up with the fact that this would not be executable in Kansas.

    33. Re:Evolve code. by benedict · · Score: 1

      The operators where I work aren't overpaid drones. They are smart, motivated people, and many of them become sysadmins before too long.

      People who put down sysadmins should be forced to wake up at 3 a.m. to fix an outage every couple of weeks until they retract their statement. If they never retract it, good, more sleep for everyone else. Hrmph.

      --
      Ben "You have your mind on computers, it seems."
    34. Re:Evolve code. by benedict · · Score: 1

      Bitwise and byte foolish?

      --
      Ben "You have your mind on computers, it seems."
    35. Re:Evolve code. by AndrewHowe · · Score: 1

      LOL

    36. Re:Evolve code. by Anonymous Coward · · Score: 0

      Half-baked premonition: the concepts from Snow Crash keep running into cellular automata stuff I've seen and....

      Semantic constructs (program) in chanted melodies of objects (genome)?

      And yes, I really am out to lunch. But seriously, given the parameters one can define the object. So what are the parameters that determine fitness to evolve? Size of code, speed of code, time to create; any, all?

      Assuming nothing, I guess gene expression would occur like; say you have a gene for loop un-rolling, it would only express itself if the unrolled is better. Yes?

      If a gene could supress the expression of another gene, perhaps because it has a better trick (ie. loop summation), it can prevent the 'older' gene from getting in the way.

      {anon coward rushes off to play with his compiler, and if anyone cares, it'll be Java}

      Regards, thank you for the idea

      *
    37. Re:Evolve code. by Anonymous Coward · · Score: 0

      It could also be said that humans can't solve the problem of writing a compiler that can handle the complexity of current architectures. Maybe we need to write a compiler capable of writing a compiler capable of writing a compiler...

    38. Re:Evolve code. by slashdot-terminal · · Score: 2

      "In the general case", perhaps a compiler will produce better code (and certainlty not, as yet, for specialised cases - an inner loop coded direct to PPC macro assembler (PASM on Amiga PPC) is the fastest code I've ever written, but
      humans still write compilers. Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write, but it'll probably depend on some weird side effect to some obscure instruction,
      or the rtesosnant frequency of your ram bus, or something...


      Evolving code? What actual evidence that this is even within the reach of all but 2 people in the world. I just can't see how exactly this works. Ok I'll shut up about that for this post.

      Generally I think that all that has to be done to make a compiler better or almost anything better is to increase the ammount of brute force checking. For example have a bunch of conditions that in effect say that if you have this and that instructions in C/C++ then you will have this output which can be created in assembly. Now this may increase the size of the compiler but it would allow faster compiler creation. Spending a large quantity of time on something that has almost hit a brick wall is usually never the way to go in this world. What a lot of people are finding out is that to get anywhere close to the level of education that is required is an incalculable frustration. Why do you think that everyone and their mother is going to those "schools" that teach people how to do network admining and such? It's because they want quick money in something that dosn't turn you into Ted Kazanski because of all the mental straining.

      Most likely by the time I actually get the education level to handle this type of thing what I have studied will be obsolete anyway.

      --
      Slashdot social engineering at it's finest
    39. Re:Evolve code. by slashdot-terminal · · Score: 2

      It could also be said that humans can't solve the problem of writing a compiler that can handle the complexity of current architectures. Maybe we need to write a compiler capable of writing a compiler capable of writing a compiler...

      If you think that something like the linux kernel or emacs is complex just try working on one of those little dears. What's wrong with just getting people to actually do the design work themselves. We created C++ and Java for the reasons of creating better code and creating maintainable programs. We must really look objectively and say is creating stuff for lazy people really worth the effort. Do you really want to be out of a job because the computer can just write that new app by itself? Do you really want to be in college for 20 years just to be bested by a machine that can supposedly do better than you? Guess what that leaves for any significant money:

      1. business: generally not very much fun seeing other people make money, thinking about money, scamming the customer, and not too interesting for computer professionals.

      2. Scientific research: Wow can you say compelex! and difficult to get into! Essentially you spend a great deal of time in college which means you have either to get a crappy job washing disher and scrubbing floors or take out thousands of dollars in student loans to get by. Not to mention the very sharp learning curve that it takes in doing these things.

      Now in my case I do not handle stress very well at all. I hate spending large ammounts of time doing anything that is either hard or boring. I have historically not had a very high success rate in solving complex problems which leaves me feeling betrayed by the problem. It's little problems like this that are taxing enough on the human condition that are going to make the task of getting into computers next to impossible.

      --
      Slashdot social engineering at it's finest
    40. Re:Evolve code. by Anonymous Coward · · Score: 0

      Hmm... makes sense... The post that was funny has a score of zero whereas the post that contains only "LOL" is moderated up. Moderation at its best.

    41. Re:Evolve code. by Anonymous Coward · · Score: 0

      Hmm... makes sense... An un-moderated funny post by an AC automatically starts at zero whereas the un-moderated redundant post by the non-AC automatically starts at one. Cluelessness at its best.

    42. Re:Evolve code. by AndrewHowe · · Score: 1

      Well, my post wasn't moderated up, I'm not an AC so I get 1 by default. I do think the funny post should definately be modded up, but I have no moderation points and I have already posted to this article...

    43. Re:Evolve code. by silichrome · · Score: 2

      I've read some of the post for this thread. I should say something P.C. like the processing
      power of the moment increases every 11 months, but frankly its a cop-out. Let me explain
      before you send the flame mail, true it is a fact that according to Moores law, that memory
      (processor also) increases capacity by a factor of 2 every 11 months. But there is also
      another law qouted be system 'engineers' , "the faster the hardware available, the slower the software that operates on the hardware". You can disagree with me, but its a simple fact
      that major system software such as $MS Windows$ requires a more powerful processor, more memory and a bigger harddisk.
      Increasingly developement of programs operate in the megabyte range. My question is why! , At one time ALL programs were limited to 64K segment, because that was the standard size for a ram chip. A "hello world" program compiled in C++(c evil progeny) can easily exceed 20k.
      You probably say BIG DEAL, even coded up in Turbo Pascal the max is 2k. I dont think that I am eccentric because I don't want BLOATED CODE!. Getting back on task, code morphing is a polished version of self-modifing code. I know it isnt obvious on the surface, but think does it really matter if the code that is emitted by the compiler is part of the active process, the end result is code is executed, that wasn't initially envisioned by the program developer.
      I know thgis post is a little off-topic and probably get moderated out, but striving toward
      maximum efficiency should be the focus. Laziness and apathy shouldnt dictated the rule stick
      . Other engineers are required to use the minimum resources to achieve the maximum requirements of the project design, shouldn't computer 'engineers' be able to acheive the same.

  31. JITs and one-time compilers; the good and the bad by ajdavis · · Score: 1

    This isn't inherent to all JITs and code-morphing techniques, but Java does take a performance hit from having much more heap allocation than some other languages. An article in SIGLANG last Spring detailed a Java byte-code compiler that got around this problem. The compiler checked each dynamically allocated object to see if any references could be made to it outside its code-block--this is easy: just check if it is returned from the block (as the value of a function, say), or if its address is stored in a variable of higher scope (i.e., a variable that will outlast the code block). If not, it is compiled as a stack allocation, rather than a heap allocation. This algorithm eliminated %90 of unnecessary heap allocations. The JIT (translating byte-code to machine code) was altered to allow stack allocations like C, and whammo!--a %10 performance increase on Pentiums. There are a few optimizations like this that require a compiler to be able to see much or all of the code at once, rather than doing repeated optimizing passes over small sections of code. I wonder if these compiler algorithms are hard to incorporate in code-morphing techniques? On another note, Java has a lot of performance/memory usage problems that are due more to things like the sandbox-security system, lots of indirection, and the compromises necessary for system-indepence. These problems won't affect a code-morpher that translates from one machine language to another.

  32. FX!32?!? by Anonymous Coward · · Score: 0

    This doesn't seem all that new, actually.

    If I remember correctly, to run x86 NT code on an ALPHA you use FX!32 which translates your code on the fly to the alpha, and saves it, so that the next time the code gets executed it is already translated, and it still looks like an x86 to the program. I believe the translation was also saved to disk, so the more you used your program, the less translation overhead you have.

    Of course this new approach was taken further by re-optimizing code as needed, and providing hardware support for an optimistic translation.

    What will the next iteration be?

    1. Re: FX!32?!? by Anonymous Coward · · Score: 0
      The next iteration can be movement of the code-morping software from software to hardware. That is, instead of having it load from a flash, it could be programmed directly into the an fpga which would sit between the northbridge and the cpu. It would do the translation/optimization for the cpu, saving the cpu's precious cycles for the actual application programs.

      You'll still get that in the field programming they were much touting but you'd lose some of the power savings. Of course, it could be targeted at a system requiring less power savings.

  33. Java Byte Code by infodragon · · Score: 2

    The Curso technology will probably make the Java chip a reality and improve upon the idea all in one shot. Program the morphing code to run Java Byte Code and you have a Java chip which optmises your code. A very clean language that you can easily maintain and not sacrifice execution speed. Every Software development Managers dream!

    --
    If at first you don't succeed, skydiving is not for you.
    1. Re:Java Byte Code by Anonymous Coward · · Score: 1
      I'll agree to most of that. Excecpt for the garbage collection. Andy Armstrong is correct that modern implementations of GC are faster than manual memory allocation. Hotspot and JDK 1.3 use a fully incremental, generational GC - and it is on typical programs faster than malloc/free. The fully incremental is nice for UI smoothness but the generational GC makes it fast on average - beacuse while in C/C++ you have to make an expensive funciton call to free, the generational GC periodically moves the rare long lived objects out of the nursery and then reclaims the entire nursery. Profiling has shown that because most objects are shortlived, this approach saves time for the majority of software. 'Tis true, saftey costs time - but saftey is important. And research is being done to do program analysis that proves which runtime checks can be removed without jepordizing saftey. This is an important area of research. Thread synchronization is costly too - but if you want threads then you can't exactly do withot.

      Slothmonster

    2. Re:Java Byte Code by jilles · · Score: 2

      The only advantage C++ has is it's speed. There is no other advantage. There's lots of disadvantages though.

      Java is easier to code, easier to maintain, platform independent. And that's only the language. C++ is going to be cobol's replacement. The next generation of programmers will make huge amounts of money keeping C++ code running. Don't get me wrong, C++ is a powerfull language in the right hands. Unfortunately it often ends up in the wrong hands: people who never really understood the OO paradigm, people who love to use multiple inheritance. people who think that macros actually improve the maintainability of their code, ... (all the horrible things you can do to make source code unreadable) ...

      You are probably right that java is suitable for web applications but then, what isn't webrelated these days? Standalone programs are a dead end for most domains nowadays. The client server architectures and n-tier architectures in use nowadays make networking a requirement not an optional feature.

      --

      Jilles
    3. Re:Java Byte Code by deltavivis · · Score: 1

      I'll agree that C++ is more mature, but there really isn't too much of a difference between the complexity of the 2 languages. Comparing the core API's between them might make C++ look more complex, but thats an API implementation issue not related to the difficulty of the actual language.

      I have no idea at all where you got the idea that Java has some "inherent use of networking and such"--thats just flat out false. Java is no more network reliant than any other language, it just makes the possibility of connecting to networks a lot easier than most languages. I get the impression from your statement about using standalone programs that you think Java only exists as applets in web browsers, most Java coders stopped making applets a couple of years ago. I love Java but hate applets (never know what version of Java is going to be on the browser, if any, can't be sure how their classloader is going to work, etc.), Java does standalone programs just fine.

    4. Re:Java Byte Code by slashdot-terminal · · Score: 1


      I have no idea at all where you got the idea that Java has some "inherent use of networking and such"--thats just flat out false. Java is no more network reliant than any other language, it just makes the possibility of connecting to
      networks a lot easier than most languages. I get the impression from your statement about using standalone programs that you think Java only exists as applets in web browsers, most Java coders stopped making applets a couple of
      years ago. I love Java but hate applets (never know what version of Java is going to be on the browser, if any, can't be sure how their classloader is going to work, etc.), Java does standalone programs just fine.


      Well if it dosn't *force* you to have networking then it pretty darn well makes it a *heavily* suggested idea. Usually if you wish to run or use a java program you have to have a browser. If I have to have a browser to run that really good new ID game then what you have just done is turn linux into a win98 clone. Plus running a graphical browser is not my system's forte. The only other way that you can create a program that uses java and actually test it is to maybe use some support under the linux kernel for what they term 'java binaries'. I know of a compiler in the gnu series but I have never seen any large scale (or gnu program for that matter) that used java for text based or internal processing for things like scientific components.

      --
      Slashdot social engineering at it's finest
    5. Re:Java Byte Code by slashdot-terminal · · Score: 2

      Java is easier to code, easier to maintain, platform independent. And that's only the language. C++ is going to be cobol's replacement. The next generation of programmers will make huge amounts of money keeping C++ code
      running. Don't get me wrong, C++ is a powerfull language in the right hands. Unfortunately it often ends up in the wrong hands: people who never really understood the OO paradigm, people who love to use multiple inheritance.
      people who think that macros actually improve the maintainability of their code, ... (all the horrible things you can do to make source code unreadable) ...


      I have actually learned/am learing C++ on a collegiate level. I can say that they do not offer Java in the CS department. This is a reflection in the fact that Java is not important for what programmers actually do in the world. There is one class of Java taught in the whole college and that is in a business related sub-optimal computer program. This surely says something about what people really do. I would wager a whole keg of beer that you would find a large percentage of schools who do not use java and a larger portion of software development that dosn't use it. (Around 30-70%). As long as java remains so un interesting and tied to the web and the internet I will never learn it. Haven't you ever tried to access the internet and couldn't for whatever reason? Now look at the number of times you computer has prevented you from accessing the data you need. Now what should happen for most people is that the internet should be statistically more prone to failure than traditional media and far less controllable and customizable for your needs.

      You are probably right that java is suitable for web applications but then, what isn't webrelated these days? Standalone programs are a dead end for most domains nowadays. The client server architectures and n-tier architectures in use
      nowadays make networking a requirement not an optional feature.


      This is quite bad for at least a few reasons:

      1. access: If I need to have more silly requirements then if begins to decrease the ability of that application to actually get something accomplished in a timely manner. When I get a computer do I have to get say Phd in CS do use said computer (contrary to what people would like of the world)? No. This is because it's not necessary. Suppose I have a video game. Does say even dramcast or playstation games or even almost any PC game really *need* to be connected to a network? I can't really see any advantage if I am just a single lone person why that makes the experience any better for me. Almost anything you want control over does not *need* to have access to the web. Does working with say simple spread sheets need web access? How about word processing?

      Also let's look at how the business model works in the internet world for services. Usually the trend has been that the individual has little or no control. This is reinforced with the huge cost of actually getting a dedicated server or bandwidth so that you can control it. Again for the average person (typical consumer who wishes to utilize the computer and has a moderate income of say $25,000-$70,000 USD/yr even if they are schooled in CS) this dosn't help at all in any way. Tell me why I shouldn't have access to all the things that I want to do at my finger tips without the need for someone to control it, charge me for it, change it, or make use of a series of networks which quite frankly are controlled by robber barons who don't care about the little guy?

      --
      Slashdot social engineering at it's finest
    6. Re:Java Byte Code by Anonymous Coward · · Score: 0

      Isn't this what interfaces are for? But I'm only just learning Java (I'm just a caveman. Pity me), and this is what the books say about interfaces... lets one skirt around single inheritance, and lets one skirt around deep inheritance trees... (and don't interfaces seem rather COM-like?), and as a consequence I would imagine that they might make it easier to work into Corba IDL, no?

    7. Re:Java Byte Code by deltavivis · · Score: 1

      good lord, you really have no idea what your talking about do you?

      if you have some java applet (not on a webpage, you've already downloaded the code, have it on disk, something like that) and you want to start it up without a browser type:
      appletviewer TheNameOfTheHtmlFileWithTheAppletTag
      Its been a while since i tried this, but i'm pretty sure this also works across a network, if your browser shy.

      If you have a java application you don't need "java binaries" (like from TowerJ) you just need java byte code. Try typing something like:
      java TheNameOfMyDotClassFileWithAMainInIt
      and see what happens. Holy ShiT! A java program without some patch to the kernel and without turning linux into Win98 by using a web browser...

    8. Re:Java Byte Code by jilles · · Score: 2

      "This surely says something about what people really do.'

      I think it says more about the college you are attending. If the business administration department (the place where they create those clueless marketing types) is the first to adopt Java in the college you are attending I have just one advice: get out of that conservative hole you are sitting.

      "As long as java remains so un interesting and tied to the web and the internet I will never learn it."

      If you don't see the point now you'll never see it. I give up. Happy hacking.

      BTW. have you tried COBOL, I heard 90% of the code in use today is COBOL (just like you no source for my information though) must be a great language, fast too!

      --

      Jilles
    9. Re:Java Byte Code by SamBeckett · · Score: 1

      But Java is important. I can tell you for fact that one major company I worked for is abandoning all platform-dependant software ( statistical packages, database systems) for the platform indepedance of Java.

      The reason your school isn't offering Java is because they are behind with the times and the Prof's aren't out in the real world getting their noses wet.

      I suggest you trasnfer immediately.

    10. Re:Java Byte Code by Anonymous Coward · · Score: 0

      You are totally witless. The usual way to run a Java program is to use a Java interpreter. The only network requirment is to download the interpreter, but you can get them on CD too. As you state many browsers have a Java interprter, but they are not the only or best interpreters. Have a look at www.blackdown.org for Linux Java interpreters. You can get a fulkl devlopment kit with compiler and interpreter or you can get the Java runtiem environment whcih is I belive and interpreter adn applet viewer thingo.

    11. Re:Java Byte Code by wskish · · Score: 1

      what isn't webrelated these days

      Realtime embedded systems. Java doesn't work too well for these. Networking and telecommunications equipment in particular are almost exclusively C or C++.

    12. Re:Java Byte Code by jilles · · Score: 3

      Java bytecode interpretation is not the main cause for java's weak performance. The cause lies in mechanisms like garbage collection, memory allocation, thread synchronization etc. which are expensive. If you'd implement similar mechanisms (i.e. with all the built in safety) in C++ you'd have exactly the same problems (I don't think that's actually feasible though).

      --

      Jilles
    13. Re:Java Byte Code by arivanov · · Score: 5

      No it will not.

      Think of Java licencing and control issues. Discussed widely on slashdot. Actually I shall retract this statement if one of the MAJOR league players will accept Crusoe as a primary CPU for at least one machine class.

      Otherwise it is obvious that it could be thy Java machine, but it is least likely to be.

      I think Cruose will actually make a reality something else which is much cooler than Java. It will make real thy developer's dream - the coat of many colors: The affordable machine of many architectures.

      On the basis of Crusoe even now you can build a machine that can happily emulate:

      Mac, Sun (lower end), IBM PPC (lower end), SGI (lower end), Alpha and curse it x86.

      1. All these are PCI based.

      2. The differences in chipsets can be ignored under the "one OS to rule them all, one os to find them, one os to bring them all and in (oh well cutting out darkness) bind them ". It can have drivers for the chipset and peripherals in question.

      3. You may actually do the reverse thing and develop drivers for the peripherals and the chipset for all platforms in question (not a hell of an effort, actually quite achievable). If peripherals are something like adaptec, tulip and a PCI VGA the drivers are basically there already. So you have only the chipset left. Actually you can intercept these and emulate chipset behaviour if you so desire.

      Overall - the developer's dream can become a reality for just a few bucks - about the price of a PC (excluding licencing for OSes and sowftare of course ;-)

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    14. Re:Java Byte Code by Anonymous Coward · · Score: 0

      Well, Java (not C++) is used in the Software Engineering course at MIT. That sounds like validation enough.

    15. Re:Java Byte Code by thraxil · · Score: 1

      I have actually learned/am learing C++ on a collegiate level. I can say that they do not offer Java in the CS department. This is a reflection in the fact that Java is not important for what programmers actually do in the world.

      i'm a Computer Engineering major at Columbia University and our CS department is in the process of phasing out most C and C++ from the curriculum in favor of Java. while it may be true that currently, C++ is more popular in the "real world" i don't think that it's because of its merits or java's shortcomings. i think it has much more to do with the fact that C/C++ is older, more people know it, and programmers are reluctant to switch from something that they know well to something totally new. but i think that with time more and more programmers will gravitate towards java for many purposes.

      i also think that a good deal of java's reputation as being slow and buggy isn't very well founded. what most people know of java is that they load a webpage with an applet, their system bogs down, then netscape crashes (and possibly takes the system with it if you're unlucky enough to be running an unstable OS). based on their experiences with a few of these bad applets (which were probably written by some junior high kid while sniffing glue) within netscape's oh-so-stable applet environment (remember, this is netscape we're talking about; i've got a function key bound to 'killall -9 netscape' for a reason), they go on to slander java. i've done a significant amount of programming in both java and C++ and i've honestly been surprised at how fast java applications (not applets) run with a decent RTE (running java stuff on solaris is very sweet).

      i certainly do not want to sound like i'm saying that java is inherently "better" than C++. they are both excellent languages with very different strengths and weeknesses. i would certainly never think of trying to do any system programming in java, but at this point i would think twice about using C++ for writing large application programs. (although java does need to improve a lot on the GUI front. Swing? ick.)

      i guess my point is that we should not be so quick to dismiss java as a "real" programming language. its main shortcomings at this point seem to lie not with the language itself or the idea behind it but with things like RTE implementations.

      --
      Smokey the Bear says, "Strip mining prevents forest fires!"
    16. Re:Java Byte Code by pnkfelix · · Score: 1

      I'd say that the having every class extend from Object is a feature of java, not a bug.

      But i do agree that having to downcast all of the time is stupid and inefficient. THat's why there were three research projects going on in parallel to add Generic (Polymorphic) Types to Java. One of them was done here at MIT, but it seems that people tend to like Bell Labs' GJ better (I can understand why)

      Basically, think Templates, but better. C:)



      --
      arvind rulez
    17. Re:Java Byte Code by Anonymous Coward · · Score: 0

      Ever heard of PERC (www.newmonics.com) or Jeode (http://www.insignia.com/4.0/products/jeode.htm)or HP's Chai?

    18. Re:Java Byte Code by moopster · · Score: 1

      My university only teaches advanced courses in Java. In taking any advanced comp-SCI course it is far more important to learn the fundamentals of things such as algorithms, OO, data structures, entity relationships etc. than to spend all day mulling over the syntax of a given program. Syntax is easy to learn... the concepts of OO are much harder to learn, as seen by the people who think they know how to produce OO code.

      --

      ----------
      No army can withstand the strength of an idea whose time has come.
      - Victor Hugo
    19. Re:Java Byte Code by -brazil- · · Score: 1

      This will not result in a performance increase. It has been shown that in general, you get better performance with simple chips and complicated compilers than with chips that try to include high-level concepts; It's this idea that stands behind the RISC concept, and *all* major CPU architectures are RISC (though some, like the x86 ones, disguise as CISC).

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

    20. Re:Java Byte Code by flux · · Score: 1
      You haven't actually tried for example the demos in latest JDK?

      You really -don't- need a web browser to run them, or even the applet viewer. And you're not missing the graphics. I've done some programs that access sql in Java, no web browser involved there!

      Yes, I've done some sql-stuff in perl too, but it's nice to have a concept of 'types' (instead of just 'arrays, scalars and hashes'). This way you can see already in the compile time if something is broken before you run it.. And perhaps that's the reason I like C++ the best of these three: you don't need to subclass everything from 'Object' or somesuch and do dynamic typecasting all the time (templates save your ass). You could say that the types in C++ are more static.. (Which has its downsides too, though.)

      Java's strength is that it has many library functions/classes with it, and you know that anyone else going to run your program have them too. On every platform. If they don't have the packages you use (for example I don't expect my friends to have postgresql.jar), you can just deliver the package with the program and it will work without difficult installing (just make sure CLASSPATH has "." too).

      I think a machine running java as native code would be kinda cool. Btw, there's already a code generator for gcc to output java.. (No, I haven't tried it.)

    21. Re:Java Byte Code by Anonymous Coward · · Score: 0

      Why do people bye this "write-once-run-anywhere" thing. The same thing is true for Commodore64 code, because there is a C64 emulator for every OS out there that has a JRE. Java is not even a really useable language, and many constructs in it prevent (sound programming and) optimization.... ie. Specialization of arguments and return values (like: an intvector that only contains int elements should return int's and only take int's as args), nothing is lost, and finding the possible types that a variable can contain is _much_ easier, that if everything _needs_ to take "Object" to be general enough. The strength of java is nothing of what Java advocates usually say: portability and object orientation -- the strength og Java is in it's standard libraries.

    22. Re:Java Byte Code by slashdot-terminal · · Score: 2

      The Curso technology will probably make the Java chip a reality and improve upon the idea all in one shot. Program the morphing code to run Java Byte Code and you have a Java chip which optmises your code. A very clean
      language that you can easily maintain and not sacrifice execution speed. Every Software development Managers dream!


      I think that C++ is more complex and has a little more maturity. As it stands now you really shouldn't use java for anything more than web related stuff. This is because of it's inherent use of networking and such. Why force the use of network connectivity when standalone programs are just fine. From a standpoint of looking at failure the more points of reliance that are utilized the worse the design. If I have to talk to 10 different people before I can get ahold of say my boss he has put a large degree of failure into his interpersonal communication system. This can be illustrated in events like the Challenger space shuttle disaster where people did not really take anything seriously because of ineffective communications.

      --
      Slashdot social engineering at it's finest
  34. Yup. by TimoT · · Score: 1

    Man can always beat the machine because man can use the machine but not viceversa.

  35. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 1

    Interactive compilation sounds like a great idea, but it can get incredibly complicated/boring real fast. The problem is the compiler must assuming many things to just set up the entry, and exit for loops, etc. You would have to guide the interactive compiler through all this which would get very boring and probably be more confusing. I wouldn't mind a compiler which profiled subroutines, etc. and said 'this is x clock cycles, here is the asm I came up with, do you wanna screw with it?'. *that* would be nice (IMHO).

  36. JIT compiler by link-error · · Score: 1

    Your assuming that the JIT compiler was a Good Programmer w/ Assembler. How much time was spent on conformance rather than raw speed for these JIT compilers?

    --
    -Unresolved symbol? Byte me!
  37. Re:GC by Mr+T · · Score: 2
    It's not sloppiness so much as what C++ let's you get away with. You start building objects and then you have to communicate between them. Do you provide references to objects or do you provide copies of the objects and then update them all when one changes? Once you start leaking references, if they go far enough, problem errupt.

    On large systems, master C++ programmers, master programmers have had these problems in real world applications. It's not so much not knowing when to delete objects so much as making sure you delete enough pieces when you delete them. If you delete an object that is referenced by another then you're going to have a crash. If you delete an object but not the references it contains then you have to have another reference to each of those objects and you have to know what you're down to the last reference so you can delete the object.

    I agree about the sloppiness of thought associated with java and GC but that laziness and the willingness to be sloppy is there regardless of whether or not the programmers are using GC. TakeGC away and they may be a little bit more careful but they've already shown that they are willing to be that lazy and sloppy.

    --
    This is my signature. There are many signatures like it but this one is mine..
  38. Re:java JIT itself is not that slow. by radish · · Score: 2

    3. as WORA is important in java, it is hard for java to use any platform specific resources such as Graphic card's accelerations. that is
    one of the reasons swing's design is getting ugly as it tries to boost its performance.


    Surely it's the JVM's job to abstract hardware acceleration in the same way as, say, DirectX or OpenGL? For instance, my windows JVM might know about directX and so implement the primitive drawing functions within the swing classfiles using the accelerated calls. I don't know if that's what happens, but it strikes me as the logical plan.

    --

    ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

  39. What about the JIT? by Anonymous Coward · · Score: 0

    What about the amount of processing time the JIT takes? The JIT runs as a process to process the instructions. I beleive that would create significant delays to the code (stealing cycle time for JIT process and not code execution). Thoughs?

    1. Re:What about the JIT? by necama · · Score: 1

      still don't like the idea that they're keeping the instruction sets closed. It would seem like if someone out there wanted to port GCC to
      Crusoes native instructions, that would be good... But they just don't want to be percieved as being at all incompatible with Intel, i guees


      Not quite right. The way that Crusoe got the massive reduction in transistors on the die is by moving lots of processor functions off of the processor and into software. Things like branch selection and out of order execution aren't done in the processor anymore. All of that is done in the software now.

    2. Re:What about the JIT? by um...+Lucas · · Score: 4

      I read that the 700 MHz Crusoe chip actually only attains the performance of a 500MHz Pentium (II or III, I forgot - oh, can Crusoe emulate SIMD? Not like it matters, but I'm wondering).

      So there you have it - code morphing, or just in time compiliing or whatever gives you about 70% of the performance precompiled code.

      That's not to downplay Transmeta's theoretical accomplishment. Those extra 200MHz go to things like monitoring it's VM, throttling the clock, etc. So in the end you lose 30% of the performance but make it up with 35X less power consumed.

      It's a great trade off for laptops, handhelds, etc. But not for workstations, servers, etc...

      I still don't like the idea that they're keeping the instruction sets closed. It would seem like if someone out there wanted to port GCC to Crusoes native instructions, that would be good... But they just don't want to be percieved as being at all incompatible with Intel, i guees.

    3. Re:What about the JIT? by BinxBolling · · Score: 2
      I read that the 700 MHz Crusoe chip actually only attains the performance of a 500MHz Pentium

      So there you have it - code morphing, or just in time compiliing or whatever gives you about 70% of the performance precompiled code.

      Comparing interpreted to compiled code on a MHz-for-MHz basis makes no sense, because MHz is not an intrinsically interesting number. Most people don't care about what their computer's clock speed is. They care about how fast it runs applications (i.e. MIPS), and how much it costs. So don't compare Crusoe to an Intel chip on the basis of a MIPS/MHz ratio. Compare it on the basis of a MIPS/$$$ ratio. How well does a Crusoe perform compared to an Intel chip *of the same price*?

    4. Re:What about the JIT? by BinxBolling · · Score: 2
      Comparing interpreted to compiled code

      Complete brain fart. That should read "Comparing Crusoe to Intel..."

    5. Re:What about the JIT? by BinxBolling · · Score: 1
      You are right that comparing MHz-for Mhz doesn't make sense, but what the original poster said was that the 700 MHz chip attains the performance of a 500 MHz Pentium. That sucks, right? Who wants to buy a chip with that level of relative performance?

      You seem to have missed my point entirely. No, a 700MHz chip that performs like a 500MHz Pentium does not suck, if its price is similar to or less than that of a 500 MHz Pentium. Why should I care that a chip with P/500 performance actually has a 700MHz clock, as long as that chip costs no more than a P/500?

      Or are you simply trying to say that P/500 performance sucks?

    6. Re:What about the JIT? by Anonymous Coward · · Score: 0

      Well, you know, Apple has been doing this for a long time. If you look at their clockspeeds on the newest chips, it seems as though they're playing catch-up with Intel and AMD, but they've done a hell of a job abating the MHz-hungers of people. This whole G4 "velocity-engine" is bull, but those diagrams with the bits and the bottlenecks and the whole schpeil they have in magazines is brilliant. My 12 year old sister can get it, for crying out loud. If Transmeta can get some real marketing muscle, they can end the whole "MHz War" and begin an all new "bit war" or "code-morphing war". Let's see another 1984-like commercial! I wanna see some Bill Bashing! Come on, people!

    7. Re:What about the JIT? by um...+Lucas · · Score: 1

      Smart people care about the things that you mentioned (price, speed). Unfortunately, we're not in the majority.

      John Q Public wants the either the biggest megahertz computer or the cheapest one possible. Those are the only two things they really care about.

      They're going to feel quite hosed when they sit and think "Wow. My 700 MHz machine is slower than his 600 MHz machine. That sucks!".

      If people didn't care about megahertz, why a few years back, did Cyrix and AMD market their chips as P133 or PR150's? Because they needed to have a speed gauge. Especially Cyrix, whose chips ran comparatively slower, but still got the equivalant performance of the faster chips.

      If Crusoe does appear in machines where people have a say about which processor is in it, either Transmeta or whoever is selling the machines is going to have to do a damn good job explaining to customers why those couple hundred megahertz disappeared. If they make it understandable, Transmeta could blossom. But if they over look it, people will shy away from their machines, thinking that they're *much* slower than intels offerings.

    8. Re:What about the JIT? by erikn · · Score: 1

      I still don't like the idea that they're keeping the instruction sets closed.

      Well, from ebcdic's comments in the first Trasmeta story, Transmeta just wants to be free to muck with the underlying instruction set. Seems more like a data hiding thing than a political move. And this makes sense as a justification to me. Cooler things in the future.

    9. Re:What about the JIT? by bab5freak · · Score: 1

      You are right that comparing MHz-for Mhz doesn't make sense, but what the original poster said was that the 700 MHz chip attains the performance of a 500 MHz Pentium. That sucks, right? Who wants to buy a chip with that level of relative performance?

      What most people are missing is that the major advantage of Crusoe is its really simple VLIW hardware. On current x86 processors, the issue logic (which a VLIW doesn't have) takes up a substantial portion of the chip area. By removing that extra hardware, VLIW chips can be smaller and faster.
      The point is that while a 700 MHz chip with the performance of a 500MHz chip doesn't sound so great, just think about when Intel tops out with 1.2 GHz and Transmeta can still crank up the speed (2 GHz, more?) Then, even with a 20% performance hit, the Crusoe still outruns the fastest Pentium.

    10. Re:What about the JIT? by nester · · Score: 1

      if you knew anything about altivec at all, you wouldn't call it bull. it is simply the best simd/vector system out there (not counting crays and stuff). i suggest you go to moto's site and read up on it.

    11. Re:What about the JIT? by Sketch · · Score: 2
      "I still don't like the idea that they're keeping the instruction sets closed. It would seem like if someone out there wanted to port GCC to Crusoes native instructions, that would be good... But they just don't want to be percieved as being at all incompatible with Intel, i guees.

      The reason has nothing to do with Intel compatibility, it has to do with the nature of VLIW. As someone else mentioned above, every new VLIW CPU from Transmeta will be different (the two CPUs they announced have different cores) and require totally different optimizing, possibly even having a different instruction set.

      What Transmeta has done is remove the primary problem of VLIW...the fact that you need a very specific compiler for your CPU to extract performance out of it. By including the "compiler" in the CPU, it's done for you at run time, so you won't have to recompile your code every time you get a new CPU.


      OpenVerse Visual Chat: http://openverse.org
      --
      -- OpenVerse Visual Chat: http://openverse.com
  40. Lagoona is an interesting development from Oberon. by Anonymous Coward · · Score: 0

    I read the Lagoona PDF essay. Is a Lagoona system currently available? I'll look at the site again. This is the first time that any Oberon-style language has shown up in this thread, although Oberon is VERY relevant to this dicussion. Oberon is type-safe, garbage collected, can run as an OS on bare Intel hardware, is very fast, and compiles 10 time faster than C++. I haven't implemented this yet, but it should be easy to write an Oberon program that constructs a code fragment, calls the compiler, and imports the compiled code into the running program. Look at http://www.oberon.ethz.ch/native/ . There's a version that runs on top of Linux, as well as a pure native one.

  41. Re:Self-Modifying Code by ZoeSch · · Score: 4

    Nevertheless I'd like to point out that JIT is based a somewhat dangerous technique: a program that alters (its own) code. I believe this technique was used in the eighties to scare off hackers by making code incomprehensible and hard to disassemble until the program was actually running. Also (even on a good old 6502 processor) it's possible to make some speed improvents peeking and poking into the code you're actually executing.

    Not at all true... self modifying code is still around not only on JIT compilers but in interpreted languages and even device drivers (Or how do you think some device drivers adjust in real time to hardware changes?). And it's not a dangerous technique in itself, but (As usual) when executed improperly it can be a wild beast (As most people discovered when thy installed their first Cyrix and AMD 486 processors and discovered most x86 code wouldn't run properly because of cache integrity issues)

    When compilers for Microcomputers got faster and most processor architectures (known as Harvard architecture, I believe) explicitly require a division of RW and RO memory segments, self-modifying code was abandoned...

    First of all, Harvard machines require separate instruction and data pipelines and memory spaces, and 99% of the CPU's on the market (General, Embedded, etc.) are Von Neumann machines which use a single space to store instruction and data. The thing is, newer CPU's (Even if they're Von Neumann desings)have separate caches for data and instructions and most self modifying code violates the cache integrity (See above) because the code is modified in the cache but never stored back into memory (Unless you use a write-through cache design which is impractical from the performace point of view).

    BUT (And that's a big BUT :) if your JIT VM forces periodic cache writes to RAM (Not every time but enough times to ensure some sort of coherence if the cache and RAM are out of sync) you lose a bit of performance but gain a stabler setup.

    Disassemblers for JIT won't be as complex as a JIT assembler just because when you disassemble a piece of code you treat is under the black box principle (What goes in and what goes out) in order to derive the fundamental principles and algorithms, which can be implemented in 1E999 different ways, so if you disassembled the less optimal morph, bad luck... disassemble another test run and see if you get a better implementation of the algorithm. Or you could use the aforementioned principle and implement your own algorithm.

    JIT code/compilers, BTW are also being tested to produce self modifying chips (ASIC's and FPGA's) under VHDL/Verilog using also NNets or Gen. Algorithms to obtaing a first implementation and then using the JIT to optimize it...

    Really interesting stuff (The Transmeta Crusoe) and my bet is that soon other companies will follow altough not for the same reasons as Transmeta :)

    ZoeSch

    --
    I hate to agree with davecrazy but...
  42. The obvious answer... by nhowie · · Score: 1

    The obvious answer would be that the JIT/VM code takes up clock cycles to convert/schedule/optimise the code, so there's always a significant overhead (JIT removes unnecessary (sp?) conversion, but there's still some overhead).
    --

  43. Re:heap and stack by tagish · · Score: 1
    I didn't exactly mean that Java should use the stack, just something that looked like a stack. In its most basic implementation a stack is a pointer that marches through memory advancing by the size of each object that is placed on it. Of course it is usual to pull things back off the stack and you can only normally do that in reverse order.

    malloc() / free() in conrast have rather more work to do. malloc() has to search through a number of data structures to find an appropriate chunk of memory, and free() has to return memory to the heap in such a way that the heap is still usable.

    My point is that when you have a garbage collector that can take an overview of the whole heap, your allocation strategy can be much more simple. The most simple possible case would be to have a pointer (c.f. stack pointer) that simply moves forward through memory by the size of each block allocated. Then all the work falls to the garbage collector which has to sort out a heap that may potentially have a number of gaps in it (corresponding to objects that have gone out of scope in 'non stack' order).

    In practice garbage collected memory manager that worked in this way would be a little naive, but I would still maintain that a GC system should be able to achieve better performance than a malloc() / free() system because malloc() / free() have to micro-manage the heap, keeping it in a usable state after each call whereas a garbage collector can do all its heap optimisation at once.

    Actually I'm cheating a bit here, so I better come clean before someone else points it out. malloc() / free() can actually enjoy all the benefits that I attribute to a garbage collecting memory manager if you implement them like this:

    void*malloc(size_tsize)

    {
    /*getmemoryfromagarbagecollect edstorageallocator*/
    returngc_alloc_new(size);
    }

    voidfree(void*mem)
    {
    /*donothingatall*/
    (void)mem;
    }
    In other words you can implement malloc() / free() on top of a garbage collector. This is what happens with the garbage collector libraries for C++; new() just calls the GC's allocator and delete() does nothing.

    --
    Andy Armstrong
  44. within X% of the performance of native code by The+Pim · · Score: 1

    As the microkernel unix people used to say before they finally died out, "We are within X% of the performance of the original". - Alan Cox The JIT guys can write all the papers in the world about how they should (yes, really should) be able to get native code speed. I'll believe it when I run Java bytecode and it's not slow.

    --

    The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
    1. Re:within X% of the performance of native code by John+Allsup · · Score: 1

      The point is being able to write 10x quicker, and have everything run within x% of the original.

      This is an important point in practice -- do you want to spend ten times more in order to get something 5% faster? The idea of microkernels, abstraction layers, etc. is to make other peoples lives easier, whilst still being fast enough.


      John
      --
      John_Chalisque
    2. Re:within X% of the performance of native code by MikeBabcock · · Score: 2

      Which reminds me of how big a smile I got when I read Transmeta's processor marketing where they said that efficiency was more important to them than performance and it would not be as fast as a native CPU, but close enough to cut it.

      --
      - Michael T. Babcock (Yes, I blog)
    3. Re:within X% of the performance of native code by spinkham · · Score: 2

      Well, Crusoe was designed for low power use, then they tried to get speed out of it..
      "Efficiency"=Power efficiency, not speed efficiency.
      It is slower then other x86 chips per mhz, but paster per watt. Therefore, they have done a good job at what they wanted to do. No, it's not an alpha or a k7, but it wasn't designed to be.

      --
      Blessed are the pessimists, for they have made backups.
  45. Pretty technical? by Anonymous Coward · · Score: 0

    This site really *is* aimed at users isn't it! If that's technical, my name's Rumplestiltskin.

    1. Re:Pretty technical? by robix · · Score: 1

      I am sorry, but this comment was ridiculous. Obviously if this anonymous coward has spent any time on this site they have noticed that there are a lot of technical and programming references. I personally love it. so there!

  46. Imperative languages are hard to reason by Chuan-kai+Lin · · Score: 3

    I do agree that machines should spit better code than humans do, but we are just not there yet. When will we get there? Maybe tomorrow, maybe next month, maybe next year, maybe next century. Nobody knows. Suppose we do get there tomorrow, I believe it will not be Java + JIT that made it, for the following reasons:

    1. Java is an imperative language, which are very hard to reason about because of the lack of referential transparency. Yes, I know that it is a known fact in computation theory that it is impossible to reason about arbitrary programs, but the fact is that we do not write arbitrary programs. Imperative programs with destructive updates are hard to reason about, and as a result we are almost always restricted to local optimizations. Local optimizations are great, but I am afraid that that alone will not be enough to break the barrier.
    2. Vast amounts of information is lost once the source program is compiled into bytecode, therefore making effective optimization much harder. You can tell the intention of the programmer from the source code, but it is hard to do so for machine instructions (or even preprocessed source code, which anyone who had read the nVidia XFree86 source code must agree). The JIT compiler must create code that acts exactly the same as the bytecode, and that restriction severely reduced its options.

    If that breakthrough were to come tomorrow, I believe it is most likely to come from the functional programming people. I believe that is the way to go for the future.

  47. heap and stack by kmcardle · · Score: 2

    I don't see where your argument about placing memory for new() on the stack holds up. The point of dynamic object creation is to allow the program to create an aribitrary number of objects and then free() them when done with them.

    Trivial example that really makes the stack argument fall apart:
    Let's say I have 500 objects new()ed. I'm assuming that the first object created would be on the bottom of the stack, and the last object created would be on the top. I now delete the first 499 items. I now have 499 * object size bytes of memory on my stack that I can't use becuase I still have the 500th item on the stack.
    I can't recover the memory because it's on the stack.

    Am I missing your argument? Help me out here.
    --

    --
    then it comes to be that the soothing light at the end of your tunnel is just a freight train coming your way
    1. Re:heap and stack by Anonymous Coward · · Score: 0

      I really don't understand this thing, algorithms for allocating and freeing memory do take into account the whole memory situation, the algorithms do some complex things because it can't know when and what order things are going to be freed in advance. I guess I'm saying that *something* has to micro-manage the heap, your simple example is naive (like you said) because now the garbage collector is micro-managing the heap (and with this "stack" thing, doing it less optimally than current methods for memory allocation/deallocation).

    2. Re:heap and stack by kmcardle · · Score: 1

      You're just shifting the work around a bit. The same work has to be done, regrouping free space, you're just trying to defer it until a convient time. I can agree with that.


      --

      --
      then it comes to be that the soothing light at the end of your tunnel is just a freight train coming your way
    3. Re:heap and stack by Anonymous Coward · · Score: 0

      But malloc() and free() aren't system calls, and aren't implemented the same way on all systems.

    4. Re:heap and stack by link-error · · Score: 1

      I think he was giving just a sample of the new() function. I believe his point was that you could effeciently manage your own memory without the overhead of excessive system calls.

      --
      -Unresolved symbol? Byte me!
    5. Re:heap and stack by Anonymous Coward · · Score: 0

      How does that work if a reference to Bar is created outside the scope of the routine or if it is returned?

    6. Re:heap and stack by ucblockhead · · Score: 1

      It doesn't.

      Which is why the compiler (or in the alloca() case, the programmer) has to be smart enough to know not to return a reference (or assign it to a global, etc.).

      --
      The cake is a pie
    7. Re:heap and stack by Anonymous Coward · · Score: 0

      Regrouping free space can be done by merging dead objects or moving live ones (and simply overwriting the leftovers). The former reclaims heap sooner and so needs less memory, but the latter wastes fewer cycles if most objects die early (which is likely in functional or OO style). More sophisticated GC is generational- since almost all objects die very quickly or live a long, long time, surviving objects are moved to a "tenured" heap that's collected (copied) far less often.

    8. Re:heap and stack by scruffy · · Score: 1

      It's easy. Just implement the stack as a linked list instead of an array.

    9. Re:heap and stack by ucblockhead · · Score: 2

      I think that what he is saying is that sometimes you could gain a lot of speed in java by allocating on the stack. This has been one of my biggest problems with Java, the way everything is done in the heap, not the stack. If an object is only used in one function, (or functions it calls) it makes perfect sense to put it on the stack, not the heap.

      (I'd post an example, but I'm not a Java guy, and don't want to screw up my syntax in public.)

      I used to do something like this in C, manually. Suppose you have this:

      void Foo(int y)
      {
      char *Bar=malloc(y);

      /*dostuff*/

      free(Bar);
      }

      With some compilers, you can save time and avoid memory leaks by allocating on the stack like this:

      void Foo(int y)
      {
      char *Bar=alloca(y); /* Compiles roughly to Bar=sp, sp+=y (or is it sp-=y?)*/

      /*dostuff*/

      /*No free needed, popped off of stack.*/
      }

      I'd think that if a Java optimizer were smart enough to know when to do this, it'd speed things up tremendously for some programs.

      --
      The cake is a pie
  48. provide some examples then... by SEAL · · Score: 2
    "garbage collection is slower than hand coded memory management" - You really ought to read some of the latest garbage collection papers. This is no longer true even for collectors managing C or C++ allocation. When given an environment like Java which doesn't allow pointers and other antiquated memory techniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.

    Now, I have yet to see anyone provide concrete examples of a garbage collector performing better for the general case, than standard memory management. However, let's assume that the garbage collector provides a small performace gain overall.

    Unfortunately, as others have pointed out, compilers don't understand the objective of the program they are dealing with the way assembly coders do. The same is true for a garbage collector.

    I can speak from experience as a game programmer here. There are certain parts of a game which are time and performance critical. How can I trust a garbage collector not to slow things down in that section? Well, I can't. There has to be some form of programmer interaction telling it not to cleanup in section X of the code.

    This isn't very practical for games though. Typically we want full control over the memory. We know the upper limit of what we need, and we don't care much about other applications. So we can allocate a large chunk at start time and then work within that, to avoid the new/delete/malloc/free overhead.

    If removing "antiquated" things like pointers is the best solution, then we'd all be writing games in Java. OK well that's a stretch but you see my point...

    Best regards,

    SEAL

    1. Re:provide some examples then... by Vidar+Hokstad · · Score: 1
      Obviously you haven't read up on modern garbage collectors. Most have functions to inhibit collection during timing critical phases. Many also support restarted collection, and can have time limits imposed, to force it to never use more than a specified amount of time before giving up control.

      For your game example, you are doing something wrong if you are doing allocations in the middle of timing critical code anyway, as you have no way of knowing that the allocation is done in a speedy way. In that case it's better to preallocate the memory you need, and inhibit garbage collection during the timing critical parts.

    2. Re:provide some examples then... by BurdMan · · Score: 1

      SEAL,

      Two final notes. Read up on the concept of "pause clustering" used by the Self language. Pause clustering is a technique used by schedulers to measure the actual lag time of many very small system overhead tasks as perceived by a user. Also look into the Train GC algorithm. It guarentees collection in a bounded amount of time. Put the two together and tie them to the scheduler and you have a system which can meet the needs of that tight loop and still do GC if need be. Games don't make sense in Java due to its poor graphics architecture and unrealistic multimedia extensions.

      thanks for your comments,

      BurdMan

    3. Re:provide some examples then... by Anonymous Coward · · Score: 0

      Actually the modern good GC works well for you here too. If you do your preallocate stategy then GC will not run. Also modern GC is fully intermittent. It can be interupted with real time constraints. Go read some other the numerous research papers on the subject or go look at JDK1.3.

  49. Why are compilers so slow to update? by Anonymous Coward · · Score: 0
    Putting aside the speed debate of compiler vs asm coder, I think there's a more important question for the compiler crowd: when new CPU technologies come out, why do compilers tend to lag so much?

    It was years before gcc noticed the Pentium existed. All the new x86 technologies like MMX/3DNow/KNE(Pentium3) are barely supported in either gcc or MSVC++, which together, probabls account for a majority of all x86 code compiled and run. No, I'm not volunteering to do any work to fix this, but it does seem wrong. In the real world, Intel threw $$$ at Cygnus to get their Merced supported.

    Frankly, I think Transmeta may be onto the right track here: they ship a good compiler in every chip, so they don't have to wait for the crucial (software) compiler technologies to catch up.

    Is it just that compilers are inherently big ugly nasty pieces of code that hate being mucked with? Is it a lack of people wanting to work on something 'unsexy' (and producing 15,000 window managers instead)? As long as compilers really are *BEHIND* the curve in codegen vs human asm coders (who'll use all the instructions), compilers will continue to lag.

    Nathan Mates

  50. Java Performance is due to the language itself... by Tom7 · · Score: 1


    Just like to point out for the (n+1)th time that the Java language itself is the main reason that Java performance is "bad". In particular, function dispatch (we have to crawl up the class chain to find the real function), runtime tag checks (every time we downcast), tag creation (so we can tag check), null pointer checks (very often). This stuff can NOT be helped by compilers (it's a losing battle since most of the time you are dealing with noncomputable problems). A big clue to this should be the fact that the language designers special cased ints and chars to NOT be objects.

    Certain "better" type systems (there are usually tradeoffs) don't require such nonsense, and are limited only by compiler technology.

    When compiling a statically-typed language like (insert your favorite here), the compiler can often do automatic theorem proving to generate much better code. In the next few years, this will come a lot farther (compilation of traditional languages is near the state of the art).

    By the way, I do agree that compilers generate better assembly code for languages like C, but only because of the sheer hugeness of the code. Hand-coded inner loops (by someone who knows the architecture well) will beat compiler code any day. If you've ever studied the code generated by a compiler, this should be clear to you. The main reason being that the compiler can't make an optimization that it can't be SURE doesn't change the behaviour of a program... stuff like the possibility of pointer aliasing and linktime magic with function calls means a lot of wasted optimizations. But a programmer is often much more easily able to convince herself that certain situations can't arise -- because she knows how the program works. Most advanced programming languages remove a lot of the ambiguities through type systems, allowing for much more powerful automatic optimizations.

    - Tom 7

  51. Static allocation vs dynamic by Anonymous Coward · · Score: 0
    Current processors don't take a huge performance hit when working with objects who's address is not known at compile time; in fact in many architectures it makes no difference at all.

    In general this is not true. First, an address calculation takes at least an addition operation. Even in a parallel-execution processor where such an address calculation can be performed simeultaneously with other instructions, the address calculation must be performed before the memory access. This prevents out-of-order execution of the instructions, and prevents pre-fetching from the cache at least until the address is calculated. There are some cases where it might be the same speed (the cpu can execute operations in parallel, the address generation does not block any other instruction from executing, and the requested object is already in the cache) but in general dynamic memory referencing is slower than static allocation.

  52. Programming is for engineers... by Anonymous Coward · · Score: 0

    ...not for twits with ego problems.

  53. JIT by any other name by TheDullBlade · · Score: 3

    "1...not a JIT system but a dynamically optimizing compiler" a slow start is a big problem (which is why they are emphasizing long-term server programs), but that's not the main issue. There's no reason that profile-based optimization can't be applied to C programs. That there's an experimental JIT (yes, it's still a JIT) that's ahead of the C compilers in common use is no reason to claim that the JIT strategy is better.

    "2...as long as the results of the operation don't change" Which means that if the representation is different in a meaningful way they have to check whether the results of the operation change, reducing the efficiency, which was my point.

    "3...on all but the most simple applications" But I rarely do anything but the most simple applications. I allocate an array, then I free an array. I do this for hashes, trees, stacks, queues, and pretty much any data structure I use. Only when I am being extremely lazy will I allocate and free objects recursively. Modern CPUs with slow main memory and fast cache like it when you keep all the data together, so I generally do so. I also try to access it linearly, which they also like. The point holds.

    Don't try to compare the most primitive and unoptimized methods of custom memory management to the most sophisticated garbage collection algorithms. Remember, too, that a person doing custom memory management can write the same sophisticated garbage collector that Java would use.

    And don't confuse "primitive" with "fundamental". Pointers are fundamental. Having to analyse reverse polish code function to prevent dangerous operations, rather than making it impossible to construct operations you don't want, is primitive.

    "4...then tell me that dynamic dispatch is a problem." Okay, dynamic dispatch is a problem. Just because it can inline methods here and there doesn't mean you won't take an overall performance hit. Besides, that was only one example of the ways the lousy Java handholding slows things down.

    The Sun marketing papers you linked me to are just more of the same old Java hype. They'll go on to "prove" that their new super JIT is better than anything you could code by hand by comparing optimized idiomatic Java with the same code translated poorly into unidiomatic C++.

    Idiomatic C written by a competent optimizer will still blow the Java version out of the water.

    JIT by any other name smells just as bad.

    --
    /.
  54. Very interesting, but not why I pick Java by evilpenguin · · Score: 2

    This is very intersting stuff, and I do care about these questions, but virtually none of this enters into my selection of a language for a task.

    When you are writing a routine that reads single keystrokes and puts them in a buffer, you really don't care much how optomized the loop is unless it takes more time than elapses between the user's keystrokes.

    In other words, I'm very glad people are working on smarter compilers and Java optomizations, but I select Java for one reason alone:

    It is easier to write more code more quickly with fewer bugs in Java than in C or C++.

    Most of us in MIS find ourselves in situations where we can throw more hardware at things much more cheaply than we can pay people to write and debug code.

    Certainly there are situations where we are prcoessor-bound, or where data comes from a high volume network source and then we must consider performance, but since in many cases even a slow language is "fast enough" for the maximum throughput, the ease and reliability of development come to the fore.

    So, if I want a fairly portable device driver, I write in C. If I want a stable application, I choose Java. For me it is this feature of the language, not its speed (or lack), not its "write once, run anywhere," but rather the ease with which stable code may be written that draws me to Java. After that, I am concerned more with the footprint of the VM and its impact on non-Java system components than I am with the optomization of Java byte codes.

    I say this not to disparage this discussion, which I really do find enormously interesting, but I must say that for many of us, this is the computing equivalent of counting angels on pinheads.

  55. Turf war - morphing vs. JIT by homunq · · Score: 2

    I'm surprised nobody else has commented on the obvious here. Sure, maybe dynamically recompiling is awesome. But doing it twice at once isn't. Just imagine your Crusoe chip screaming in agony as the code-morphing tries to profile and recompile code that is being dynamically recompiled by the JIT compiler. And how do you fix that?

    By abstracting your two smarty-pants recompilers into a "hardware" and an "interpreter" layer, you've removed any way for them to talk to each other or even really be aware of the other's existence. The problem's not theoretically impossible, but the practical advantages of keeping your abstractions clean make me worry about the hopes for Crusoe web-pads (which would presumably run a lot of Java).

  56. Re:Compilers dont write better code than humans by GnrcMan · · Score: 1

    The fantasy is that you're some super studly asm c0der who can not only produce properly scheduled code, but al...

    Thank you, I couldn't have said it better myself.

    --GnrcMan--

  57. Hey, that's insightful by Anonymous Coward · · Score: 0

    Why don't we just ban all programming languages and go back to flipping switches. That'll guarentee job security.

  58. offtopic but... by Shin+Elendale · · Score: 1
    Now I have a question. One of the tricks the Transmeta processors use to get faster is locking the OS into ROM memory. This is nice and good, but in another 3-4 years I'll want to use the next version of the OS to get better drivers, more efficient code, etc. With the Transmeta chips wouldn't it be impossible to upgrade? Please correct me if I'm wrong because I really want one of those things!

    -Elendale (I know the driver is a bad example, so don't correct that!)

    --

    IANAT (I Am Not A Troll)

    1. Re:offtopic but... by Hallow · · Score: 1

      it's called flash rom. Just like you can upgrade your pc's bios that's stored in rom, I'd imagine you'd be able to upgrade.

      ROM isn't really ROM anymore. It's more like SROM - sometimes read only memory.

  59. Re:No. Hand coded assembler is faster. by nikh · · Score: 1

    I guess it was inevitable that this debate would lead to this kind of argument, but somehow I think most people have missed the original point... If you tell a hardened ASM programmer that a compiler can produce better code than they can, the knee-jerk reaction will always be, "No way!" But we're not talking about static compilation here. The subject is runtime compilation, actually producing code during execution.

    If you are writing any sort of genuinely generic code, then the likelihood is that you will not be able to envisage every possible use that the code will be put to. What this implies is that as a programmer, you cannot know the exact circumstances in which your code will be used. A run-time compiler knows the exact circumstances, including how many registers a calling function may have free, how frequently a particular piece of code is executed, argument and loop invariants, and so on.

    This knowledge allows the compiler to improve the efficiency of argument passing, memory usage, cache usage and various other things dynamically according to the circumstances of execution, and it is knowledge that the original programmer simply cannot know. The only way a human programmer can perform these kinds of optimisations is through making assumptions as to the code's usage, which defeats the purposes of generic coding.

    Think along the lines of writing a piece of code, then seeing the compiler product different chunks of machine code from the same source, depending on its usage. Maybe the compiler notices that a particularly frequent part of the program consistently calls your function with the same parameters. It can use this knowledge to compile your program with the constants in mind, thus saving register usage and parameter passing overhead, for this particular case. For a human programmer to perform this kind of optimisation, they would have to analyse every possible use that their function was put to. That simply isn't practical for non-trivial programming projects.

    If we really want to debate the implications of run-time compilation, the question should not be whether a human programmer can optimise better than a static compiler. We should be asking whether the advantages a run-time compiler has over static compiler are enough to outweigh the overhead that is incurred by the compilation process.

    Someone pointed out that a 700MHz Crusoe processor executes code at approximately the same speed as a 500MHz Pentium III. It sounds like quite a big overhead, but it would be a more meaningful statistic if we knew how fast a Crusoe processor would execute entirely native code...

    Nik.

  60. Re:that's what longs are for by AndrewHowe · · Score: 1

    OK, but how about this scenario: I want to hold a number from 0 to 1,000,000. It's too big for the lowest common denominator (16 bits) so I whack it in a long. On a machine with 64 bit ints, I just went sub-optimal.
    So the whole idea of 'int' as this optimal size is fairly useless in practice, IMHO.
    What I really want to do is tell the compiler "I want to store 0..1000000" and let it sort it all out. Then I can decide whether I want to compile for size or speed [at runtime, in the spirit of this article], or if I definitely want that value in a long because some other code needs it to be that size.

  61. GTK vs Qt by Anonymous Coward · · Score: 0

    Flame me if I'm wrong but I think that's the same compromise:
    Qt is easier to hack with and much more intuitive to program. As a result u develop apps quickly.

    in Gtk OTOH, under the toolkit there's an implementation of OOP in C, thus you have to follow a protocol to code Gtk apps. Its more complicated but you can control everyting.

    In short term, obviously Qt becomes more rentable, but I think in long term Gtk apps will stand out as they will let programmers more control over the underlying code.

    -----------------
    Don't take me too seriously, I'm more a philosopher than a programmer

  62. Re:Profoundly counterintuitive? by Anonymous Coward · · Score: 0
    I disagree. That is what people have been told over and over again, but this is wrong. For example, you can solve the 8 queens problem in 500 assy lines, and there are a lot of problems in that complexity range.

    Wow, that's really useful. I mean, just the other day, I was thinking to myself "Man, what I really need is a blisteringly fast program to solve the 8 queens problem."

    And the fun factor is incredibly high.

    Different strokes.

  63. Native code will always be faster by Anonymous Coward · · Score: 0

    Whatever methods that may be used to improve bytecode speed/efficiency can be (and will be) applied to native executables. Therefore any speed gains made by Java/bytecode programs will also be realized by native binaries. That leaves JIT compiled programs with extra overhead and therefore slower.

  64. OT: computers at blitz by Daniel · · Score: 2

    Actually, computers tend to be even better at speed chess than at regular chess. The reason is that computers analyze games by expanding the game tree as far as possible. Because the chess game tree grows exponentially, this leads to diminishing returns as the amount of time increases; if you watch the "ply searched" indicator on a chess program it'll generally jump to somewhere between 3-5 moves almost immediately and then appear to stall. However, even this shallow game tree is enough to catch an unsuspecting human in a tactical trap.

    The reason humans can still beat computers is that they can apply long-range strategic planning -- "what are the best ideas to pursue in this position?" -- and rules of thumb or experience about what sorts of things are likely to happen in a certain type of position. Working from this, they then decide on a small set of moves to evaluate, usually in a narrow tree (because humans can generally discard moves which are irrelevant out of hand; I believe some better computer algorithms try to do this as well but I'm not sure) This approach doesn't really start to pay off until the human has had some time to think, but once it does it can allow the human to catch up to the computer, which is still examining every possible outcome of shuffling the rooks back and forth.

    Disclaimer: all of the above is empirical guessing and may not bear any resemblence to reality.
    Daniel

    --
    Hurry up and jump on the individualist bandwagon!
    1. Re:OT: computers at blitz by Repton · · Score: 1
      because humans can generally discard moves which are irrelevant out of hand;

      They did an experiment, once: Take a chess board, show it to a novice and to a master for 5 seconds, and then ask the players to reconstruct it on a blank board.

      The novice got it mostly right, but with some pieces translated a few squares. Whereas the master tended to translate groups of pieces, to produce a game that looked quite different, but was tactically extremely similar.

      (it seems you're right about the narrow tree, too... For a good player, bad moves simply don't exist: They don't get considered and discarded, rather, they are never even considered)

      (information comes from Goedel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter. If you haven't, read this book :-)


      --
      Repton.
      --
      Repton.
      They say that only an experienced wizard can do the tengu shuffle.
    2. Re:OT: computers at blitz by Anonymous Coward · · Score: 0

      I'll give you a +1 just because you read GEB (and mentioned it)
      :)
      You're welcome.

    3. Re:OT: computers at blitz by Daniel · · Score: 2

      I haven't heard that, but I have heard of extremely similar experiments, where weak players and strong players were given a similar challenge, but with both positions from real games and randomly generated positions. The novices did equally well (or poorly) on both types of positions, while the strong players performed significantly better on the 'real' positions than on the random ones.
      I don't remember where this information comes from, it was probably some chess book or other :)

      Daniel

      --
      Hurry up and jump on the individualist bandwagon!
  65. Re:This is so true, but it's getting harder by bradleyjg · · Score: 1

    Two points:
    1)Logically no perfect compiler can produce better code then a perfect human coder.
    Note this statement has a couple of implicit statements - given infinite time, and the code may be of equal quality.
    This is because the perfect human coder programmed the compiler, and programmed the application. Since the compiler acts in a deterministic fashion the programmer can create exactly the same code as the compiler.
    2)The issue of dynamically recompiled code is indeed an interesting one. Whether or not the extra overhead need to monitor and recompile is worth it can not be determined ahead of time. Most likely this would be of most benefit to long running applications which use a variety of data sources (such as certain scientific applications).
    Bradley

  66. Re:Compilers dont write better code than humans by petermcanulty · · Score: 2

    This is a personal anecdote where I would say that the compiler generated better code than I would have.

    I worked on an embedded network analyzer product using the intel gcc compiler for the i960. It contained about 16000 lines of C code (not trivial, but not large) that included a bare bones operating environment, network drivers, and the actual analysis code. Of these ~16000 lines, about 500 of them were in the "common critical path" and would execute on every incomming packet.

    The intel gcc compiler for the i960 has this is really cool feature called "profile based optimization". You build an instrumented version of the executable, run a set of test data through it, get the profile data, and re-compile with the profile data. This optimization feature had the added advantage of closely integrating both the compiler and the linker.

    The main benefits of this optimization included:

    Setting the branch prediction bits for the most common cases.

    Inlining functions with high hit counts : both within a module (traditional inlining), and across many modules (an advantage of linker integration)

    Placing variables with high hit counts in faster memory (the i960 had 1k of on board, single-clock DRAM, and our card had both 2-clock SRAM and 4-clock DRAM)

    Inlining all functions that get called only once in the executable.

    I saw a 150% performance improvement over the non-profile based, but otherwise fully optimized builds.

    The key 500 lines were interspaced through about six functions in four modules, often in functions where half the function was error/unusual path handling. Although I could have rewritten those 500 lines in asm, it would have been very hard to keep it integrated with the rest of the code. Also when I examined the instruction trace of those 500 lines, I found only two or three places where I would want to change things.

    Intel's x86 compiler is supposed to have similar a set of profiling optimization features and another feature that's even more important on the x86 that lays out functions and data in the executable to reduce cache misses and page faults based on the profiling data.

    In the "me too" catagory: Optimizing for Alpha processors is a nightmare best left to the compiler. Just try to single-instruction-step through some Alpha code and see how much you move around within the C source.

    Peter (first /. post!)

  67. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0
    The fantasy is that you're some super studly asm c0der who can not only produce properly scheduled code, but also has the time to *completely re-write* it: a) For each new processor ...

    You make a totally valid point here, but I don't think that it is at odds with the original comment! Yes, one of the big advantages of using a compiler is that we don't have to rewrite swathes of code every time we change anything. (And this is true even if we stick to one architecture - just a minor change in algorithm can mean a lot of effort in asm.)

    But that's beside the point: the fact remains that if we do spend the time to write the code by hand, we can always do a much better job than the compiler. Compilers are, frankly, still rather crummy, but they're good enough that we put up with it to reap the very worthwhile benefits they offer.

    (I think we may all be in violent agreement...)

  68. Re:Match code frag. as opt. tech. (Was:Evolve code by Anonymous Coward · · Score: 0

    And I seriously doubt you've ever experienced the superior power a machine learning system has on some kinds of complicated optimization problems. While you are going into great detail about how to optimize the compilers even more you are running inside a big labyrinth and at the end have to rewrite and rewrite even your compiler again and again because you have to change some little things to make it perform better on this and this new cpu or architecture. This is the best example the author of the original article could have for writing unmaintainable code. On the other hand, if you step back and get the bigger picture and abstract the problem, you'll quickly see that over the time there will be a paradigm change towards computer-generated code. Whether it is just the translation from abstract languages into machine code that is done by some intelligent program, or whether it is the whole programming at all is a matter of time. The time will come whereas the exact definition of the problem is the solution. While today yet only relativly simple problems are solveable through the help of genetic programming systems, with the increasing computing power and even more important the increased networking of this power worldwide, we will be able to solve even the most complicated optimization problems in an evolutionary way, striving for perfectness. --scut

  69. Re:Compilers dont write better code than humans by interiot · · Score: 1
    until we tell it every nuance about the program, a human will still be able to use that knowledge to make faster programs.

    This isn't quite right. If either:

    • The programmer doesn't know everything about the target machine (eg. there's more than one), or
    • The programmer doesn't take the time to optimize the code (it happens all the time)
    then a JIT compiler could be faster than a human coder.
  70. Re:JIT Theory by TheFuzzy · · Score: 1

    While I agree with the critics that Java is probably not the programming language of a new machine architecture, I do think Just In Time, as an essential idea, is the future.

    Eventually, either the nanotech or the re-programmable chip scientists will have the necessary break through. Both chip "architectures", the one being based on random molecule distribution, the other on re-programming the chip as it works, will require software conditioning and optimization - over a *long* period, when measured in nanoseconds. In either case, the theory behind Crusoe and JIT is likely what we'll be using - and developers will be "teaching" new computers rather than loading assembly programs on them.

    Me, I'm looking forward to it.

    -The Fuzzy

  71. Re:Ooo, ooo, start on packing! by ebbv · · Score: 1


    This involves one of the internal debates I've had about C for years

    the debate here is not about C, it's about hiring nimrods. the problem is that people write bad code, and it's almost unavoidable. no matter how you change the language, there will always be twinks to write bad, unmaintainable, unportable code in it.
    ...dave

    --

    Think different? I'd be happy if most people would just think...
  72. Re:Ultimate optimisation needs better hints by zeroth · · Score: 1

    One nice feature of development with typical Microsoft tools is the use of ASSERT. In debug builds (where _DEBUG is #defined), it takes a form similar to C's traditional assert(), otherwise, it is opimized away to nothing. This lets you ensure the healthy state of the code that you're developing and then get a nice fast chunk of code to release.

    Now, what if the compiler were smart enough to examine these assertions and base some optimization on them? If you ASSERT( x>=0 && x, then the compiler should be quite comfortable in storing x in a 8bit register.


    --mark
    one step towards making better self-documenting code

  73. Hey everybody...Listen Up by spacewalrus · · Score: 1

    This whole question boils down to whether or not a human mind is deterministic. We can prove that a compiler is, or in fact that any program that runs on current computer hardware is. If a human mind is deterministic then it follows that a human could do just as good as a compiler simply by using all of the same algorithms to generate the code although the computer would certainly finish alot sooner. If a human mind is NOT deterministic then it is clear that a person who had no programming skill whatsoever could beat the best compiler given enough time, a specification sheet of the language he was to generate the given algorithims in and a circit diagram of the cpu he/she was coding for. This is VERY simple to see. A compiler can NEVER under any conditions for any reason EVER EVER EVER produce better code than a human because the human can ALWAYS produce the same code that the compiler did. The question should be is it reasonable to wait the 150 years of time it would take me to compile mozzila by hand or should I build a compiler to do it in a few hours. The next question would be how much time(if any) should I spend hand optimizing the compiler output. A question I have is why cant we do the same thing that a JIT compiler does before runtime? It seems clear that you could easily get very close to as much optimization from a multi pass profiling compiler as you ever could with a JIT without the runtime overhead. Later Spacewalrus

    1. Re:Hey everybody...Listen Up by Xeger · · Score: 1

      Not everybody would agree with your claim that the human mind is nondeterministic. Alan Turing was keen on the idea that the mind is a Turing machine. DNA is actually simpler than a Turing machine. So be careful when you pull a holier-than-thou routine on microprocessors...you just may be insulting your little cousins.

      Food for thought.

    2. Re:Hey everybody...Listen Up by spacewalrus · · Score: 1

      A very good point, however even if the human mind is deterministic if could still at least tie with the compiler

      Also I am not at all saying it is practical to stop useing compilers and just hand code everything. I was just sick of sitting here and seeing a bunch of people making statements like compilers will ALWAYS beat a human and similar things. Allmost every post here at the time I posted was claiming that compilers Always won and I just wanted to clarify to everybody that this is true only becuase we have very little motivation in general to attempt to beat the compiler.

      Hopefully the human mind is deterministic or I won't be able to download my brain into those computers in psycology today :)

      SpaceWalrus

  74. Re:Not so obvious by jilles · · Score: 2

    I had a commodore 64 once. Never bothered to learn assembler code for it but even today I'm amazed by what people did on those machines. But it just doesn't scale up to a modern PC.

    Perhaps those people from Intel can write code for trivial software but anything that goes beyond that is even beyond the minds of those people. Of course those guys at Intel can program their own CPU's manually (at least I hope so). Hand coding millions lines of assembly code is something else though, let alone optimizing it. The problem doesn't scale linearly you need a computer to optimize that.

    Either way a static compiler nor a human being can compete when you start to involve real time profiling information. It gives the dynamic compiler an edge, it can optimize to situations only known at run time.

    So, I hope you see my point. I admire people who can handcode a more optimal program than a compiler but they can't compete with dynamic copilers such as transmeta's code morphing or SUN's Hotspot because they simply don't have the kind of information those compilers have.

    --

    Jilles
  75. Re:Compilers dont write better code than humans by coreman · · Score: 2

    There is certainly a range of people that write code and I've seen both extremes. Poorly written higher level language programs aren't going to be improved by a compiler. There might be suggestions by a good level syntax checker but that isn't the topic here. One of the long term comments I've always made is that you can write Fortran in any language. If you don't understand the language constructs sufficiently, you aren't going to be saved by the compiler. There is no substitute for programming competence.

  76. Re:No. Hand coded assembler is faster. by GnrcMan · · Score: 2

    This post clarifies what I was saying. To summarize: 1. By "modern processor" I mean a processor that allows a compiler to perform good optimizations. The X86 instruction set is not very good for this. I don't believe the 486 even performed any parallel execution. Specifically, I have experience with the Alpha. 2. You are talking about a specific algorithm. As I said, 99% of the time the compiler will beat you. I didn't say assembly was dead or not useful sometimes. Just that most of the time the compiler will generate faster code overall than a human. 3. Facinatingly enough there are cases where your assembly optimization will actually hinder the compiler from doing a better optimization. Specifically, some paralellizing compilers I think.

    Check out this link for a plethora of information on compiler optimizations.


    --GnrcMan--

  77. Henry Massalin's "Synthesis" Dissertation by Allen+Akin · · Score: 1
    There are significant gains to be had from runtime code generation. In my opinion, one of the best papers ever written on this subject is Henry Massalin's dissertation on his "Synthesis" OS.

    Synthesis takes performance to a level that's impossible to match with code written statically, by using things like address information that's known only at execution time, as well as knowledge of how multiple reusable components are dynamically connected.

    Check it out (gzipped PostScript file): Synthesis: An Efficient Implementation of Fundamental Operating System Services.

  78. Compilers as good by Anonymous Coward · · Score: 1
    I'm a huge fan of compilers. I've long been one to say that compilers can produce better code than humans on a lot of architectures, usually because of the lack of intimate knowledge of the processor on the human's part. I've also been one to say that as chips increase in complexity there will be a time where humans just can't compete with compiled output. I think there are some misunderstood issues here though.

    First, JIT compiling equates to a type of peep-hole analysis, JIT compilers simply don't have enoguh information or enough time to do full global analysis. They aren't even in the same league with modern static compilers, there isn't any contest at all.

    Second, runtime profiling and optimization is a fairly new thing. Hotspot kind of does it and there are some research compilers that do it but it's still new and the pain/performance gain ratio hasn't been established as something that makes it "worthwhile" yet. It's also important to note that there are even more aggressive optimizations that can be done with static compiling. Some papers (sorry I don't have the links on hand here at work) have suggested as much as a 30% performance improvment with runtime optimization and particularly branchy code (that's where they shine, monitor branch selection and then recode loops to take the most optimal path most of the time to avoid stalling the pipelines on processors that can't take both paths, that sort of thing where a compiler needs to make a selection that can affect performance but doesn't have enough information) but it's still research. I don't know about you but a lot of developers don't particularly liek the idea of trying to debug code that dynamically get's rewritten as it executes. There is going to be more work done in this area as it is pretty hot but it is by no means the end-all yet. Again, you have to understand that runtime optimization isn't going to do global analysis and that's where a lot of the big advances in compiler technology have been made over the last 10-15 years. Higher order analysis and optimization usually yields far more performance than putting polishing tweaks on a few loops that can already be highly optimizated by a good compiler.

    Thirdly, java's dynamic nature is a key aspect of it that will always prevent certain types of code from being as fast as C or C++, regardless of the compiler. I'm not going to criticize various weaknesses or compilers for java, but this is something that needs to be remembered and understood. Also, processor time recompiling, reoptimizing, etceteraing code is processor time not spent running code. JIT needs to yield a performance boost that covers the time JIT takes, this isn't easy to do.

    I'm also a huge fan of garbage collection, it's particularly nice in large and sophisticated object models where you almost always lose track of who has a reference to what. GC can slow code down and it can also make certain types of code faster, both sides are right on that. The important thing is that it makes code better, most programs don't have time restrictions such that GC would affect them and most good GCs are so fast that the user wouldn't notice anyways. GC makes it easier for programmers to get away with being sloppy and while it may rob performance that's not the reason for it. I'm not sure that a discussion on GC is germane to our topic here. It's generally understood and accepted in research, academic circles, and it's becoming more so in actual development groups that GC is a "good thing" (tm) and it's worth the price of admission, software get's bigger and more complicated and we need tools to help us stop worrying about the minute details as much, it's an economic thing as much as anything because programmers tracking leaks aren't programmers adding value that you can market.

    Lastly, and maybe this is just due to my non-expertise on JIT compilers but the trend that I see coming is parallelism. Chip realistate is reaching the point where it's more efficient to put 2 or 4 processors on one die, extra cache isn't paying off as much and adding funtional execution units makes the design much more complex. This parallelism is going to need special compiling technology to exploit it, possibly including runtime profiling and re-optimization. I think that is where it's going and that JIT as it is now is a dead-end. Not that I mind the effort being put in to it but by the time someone get's java running as fast as C++ or C it will be pointless. Maybe I'm fooling myself here, I'd love to be wrong on it, maybe somebody will make some breakthroughs and we will get JITs that do global analysis and optimize code to work with the garbage collector and run as fast as C++ but I'm not going to hold my breath. Maybe I'm altogether wrong and parallel multiprocessor chips can be epxloited with a compiler running outside of the code on a different processor, I don't know.

  79. Re:Not so obvious by jilles · · Score: 2

    Your programmer would have to be really good to beat a modern compiler on optimizing for very complex instruction. Also he probably would need a lot of time even to get very simple programs up and running that way.

    And then there's still something a java VM (I'm assuming the hotspot type here) and that is dynamically (i.e. while the program is running) optimizing code using profiling information. This also works in situations where a program dynamically loads a class (possibly a class that did not exist at the point the other classes were developed). Your programmer cannot not take this situation into account when he is coding.

    In other words, under normal circumstances the assembler superman would lose (badly).

    Now, your next question will probably be why java is so slow then -> see my other reply's on this story.

    --

    Jilles
  80. exactly by Cris+E · · Score: 1
    You could write the uber-compiler described above, but it'd largely remain an academic or fringe tool (like, say, Beowulf) regardless of how cool or fast it is.

    It'd only become a good idea if you needed the performance so bad that you could tolerate the code maintenance nightmares.

    In many/most places of business it takes a *lot* of performance gain to excuse unreadable code. As the man says, I already have enough performance in most situations: I'm user-bound more often than processor- or disk- or network-bound.

    Cris E
    St Paul, MN

    1. Re:exactly by Kukester · · Score: 1

      You miss the point. He is saying that the compiler should generatt optimizations as good as hand coded assembler FROM easy to maintain languages like java and c/c++.

  81. Re:Profoundly counterintuitive? by rmstar · · Score: 1

    >C/C++ was not created so that programs could use
    >less CPU cycles; it was created so that
    >programs could be written more quickly
    >(programmer time), more robustly, and vastly
    >larger than their assembly
    >counterparts.

    I Agree.

    That's why you write most of the stuff in C/C++ and then if something is time critical you do the 500 instruction assembly thang. Or you buy a faster computer.

    >But the scope of problems you can solve in those
    >500 lines are so unbelievably small that its not
    >worth the time to test it.

    I disagree. That is what people have been told over and over again, but this is wrong. For example, you can solve the 8 queens problem in 500 assy lines, and there are a lot of problems in that complexity range. And the fun factor is incredibly high.

    rmstar

  82. Re:Compilers dont write better code than humans by GnrcMan · · Score: 2

    As a side note: I remember hearing FX!32 also keeping some information from the JIT compilation and storing it on disk for the next time the
    application is run. Anyone knows whether this is still used and to what success?


    Yes, FX!32 would actually profile the application and convert parts to native Alpha code, storing the native code in a database. Incidentally, the profiling happened during execution, while the optimization happened during free time, after the program was done being executed. The more you ran the program, and the more code paths you hit, the faster it would get.

    It's a moot point now, though, since NT doesn't exist for the Alpha anymore.

    --GnrcMan--

  83. I wonder if it's really two cpus by jhonan · · Score: 1

    I strikes me that the Crusoe could have have effectively two
    complete cpu's inside. One to do code morphing / JIT, the
    other to do VLIW processing. They could have entirely
    different instruction sets.
    The jit cpu could be working independantly of the VLIW cpu,
    without 'stealing' it's cycles, and feeding it.
    The VLIW cpu may not need external bus connexions.
    Otherwise I'm puzzled about how you you would get the
    performance claimed.
    Jamie

  84. Re:Compilers dont write better code than humans by technos · · Score: 2

    Massivly parallel creations are largely very close to what the programmer wrote, good compiler or no. Compilers for this case are especially weak, and only make vague and half correct assumptions.

    --
    .sig: Now legally binding!
  85. Dynamic Compilation by jmaessen · · Score: 1

    This is a timely post for me, as I recently attended the ACM workshop on Dynamic and Adaptive Compilation & Optimization. I thought I'd highlight a few interesting projects which bring up issues that haven't been discussed here. I'll pull these mostly from the panel, which described full systems rather than interesting techniques).

    First, note that code which is structured clearly and readably for programmers isn't usually structured perfectly for compiler optimization. Calling conventions add overhead, inlining adds code bloat---and in the presence of e.g. dynamically linked libraries, there's no opportunity to optimize.

    One of the biggest overheads is dynamic branching---this is why dynamic method dispatch in Java/C++ is slow, but it also means that the simple act of returning from a function call is also slow. From this perspective, the Dynamo project at HP ( http://www.hpl.hp.com/cambridge/projects/Dynamo ) is particularly fascinating (I've heard tell that many of these folks now work at Transmeta). What dynamo does is it _interprets_ fully-optimized native binaries! When it spots hot code, it unrolls it, determining where all the dynamic branches are going. The unrolled hot code is then optimized and run without interpretation. The result? When dynamic branches usually go to one place, the unfolded code can be executed in a straight line. The speedups (again, with already- optimized code) are impressive.

    Second is the Jalapeno system from IBM ( http://www.research.ibm.com/jalapeno ). This is a dynamic Java system written entirely in Java. It does optimizations akin to Sun's HotSpot. The interesting idea here, though, is the ability to optimize across the run-time boundary---because the system is Java code as well, Java internals can be inlined and specialized within user code.

    Finally, few people seem to know about partial evaluation. The idea here is that a lot of code depends heavily on a few run-time parameters which get set once and then (almost) never change. So
    what we can do is generate a partially-optimized program where we "fill in the blanks" at run time. Note that these blanks would normally be references to variables, run-time condition checks, and so forth---so the partially optimized program is already at least as efficient as our original code run through an optimizing compiler. At run time the blanks are filled in and some more (usually simple) optimizations are performed, usually making the resulting code substantially more compact and faster. Two extremes (both for C programs) are the DyC project ( http://www.cs.washington.edu/research/projects/uni sw/DynComp/www/ ) which acts like a dynamic compiler for C and Tempo ( http://www.irisa.fr/compose/tempo ) which requires explicit annotations but can probably be a bit more aggressive as a result.

    -Jan-Willem Maessen

  86. short answer by TheDullBlade · · Score: 2

    I'll simplify:

    1)Anything that can be expressed in Java can be expressed in hand-coded assembly; the reverse is not true.

    2)Humans are smarter than computers.

    3)Humans can always use computers as aids to their work, and so produce output equivalent to what the computer can produce as a baseline worst case.

    --
    /.
  87. just plain wrong by TheDullBlade · · Score: 2

    On a machine with 64 bit ints, as on a machine with 32 bit ints, ints and longs are the same (assuming the compiler was competently designed; it could fit the standard without being). Using long should only ever confer a performance hit on 32-bit machines, for 32-bits and over, long is pretty much guaranteed to be the same as int (AFAIK, the standard doesn't force it, but it doesn't require any more range in the long, so it would be stupid to make it bigger).

    If you want to define your own special int ranges, you can always use Ada. That's the kind of language you get when you throw in every nice little feature.

    --
    /.
    1. Re:just plain wrong by Scott+Wood · · Score: 1
      On a machine with 64 bit ints, as on a machine with 32 bit ints, ints and longs are the same (assuming the compiler was competently designed; it could fit the standard without being). So I guess gcc, ccc, and all other non-NT compilers for the alpha are not competently designed? They use 32-bit ints and 64-bit longs. (NT has a 32-bit long).

      --

  88. Java not a useable language? by Anonymous Coward · · Score: 1

    That's just not true and you know it. If the standard libraries are Java's (only?) strength then how could it's object orientation not be a strength too since the standard libraries are object oriented. Also, portability is much greater in Java than in most other languages. Consider the fact that there's no need for something autoconf in Java as an example. Sure, the same's true for Perl, but that's not a language, it's an implement of torture.

    1. Re:Java not a useable language? by ^me^ · · Score: 1

      This is turning out to be a programming language war, majorly. Don't bring perl into a discussion about java, man. You're using it as a scapegoat. programming language wars == off topic.

      --
      No one ever says, 'I can't read that ASCII E-mail you sent me.'
  89. Wait a sec. by aheitner · · Score: 2

    I can also write code in ML that can be proven correct at compiletime, since it's statically typed and lexically scoped (Java can still come up with an exception and die on you).

    So my question to you is: How are your languages so magic if none of the software I use on a daily basis uses them, or seems to miss their properties?

    Some guys here at CMU wrote a TCP/IP stack in ML that was provably correct. It took them a whole year. Was it worth it? You could write a whole operating system in that time!

    ...

    The stack-based ISA for Java is slow. It doesn't map well onto actual processors. It's a big reason that even JITC'd Java isn't very fast.

    There's no reason not to use a language like Java with slightly stronger typing than C++ (although there are good reasons to use C++; there are powerful things C++ can do that Java can't). But it should be compiled, even if "compilation" is something that happens the first time you run a program, and it should be compiled from a more intelligent format than a nonexistent ISA. Why not distribute source code after the frontend has run on it, with all the symbols resolved and the syntax broken up into lower-level constructs? That would be as hard to reverse-engineer as assembly, and significantly moreso than Java classes (which leave way too much info in there).

    ...

    Being able to state with supreme confidence that this type of bug does not exist in my code because it has been rigorously proven is helpful. You really should be more concerned with running correct code than running "fast" code.

    Except for the fact that we've gotten everything done so far by writing fast code, and testing it to see that it's reasonably bug free. Proving code with Java is still quite hard, and writing code at all in ML is very hard.

    Empirically, companies will trade off getting the software written for having it proven; they've been doing things that way for quite a long time. Java certainly won't change that.

  90. Re: Compilers - Deep Blue vs Kasparov by iang · · Score: 1
    It's just like Deep Blue vs. Kasparov again.

    Does this mean I just have to wait until I can afford something with the power of Deep Blue and then I'll have a compiler that can generate better code than a person? Given current exponential rates of improvement I guess that's not too bad.

    Problem is, I'm not sure it's true. If you consider chess to be an optimisation problem, I have my suspicions that it's actually a good deal more complex than chess. Chess is relatively narrowly-defined. It feels like the space that code optimisation has to work in could be far broader?

    It certainly feels like a different problem - one simple way of doing better than the compiler is to look at what the compiler produces and then remove all the stupid mistakes it's made. (Modern compilers tend to make 3 or 4 on every page of code. And I mean really stupid - completely failing to notice that values were already available in a register and reloading/recalculating them, that sort of thing.) This is not likely to be a successful approach with a chess game...

    --
    Ian Griffiths
  91. slashdot ate my < ! by TheDullBlade · · Score: 1

    that should read "should only ever confer a performance hit on <32-bit machines"

    When I hit preview, it changed the &lt; to < in my text, and then thought it was the start of a tag when I hit submit.

    Lesson: Don't use preview!

    --
    /.
  92. theory vs practise by Anonymous Coward · · Score: 0

    In theory you are right. A compiler should be able to produce better code than most any human. In practise it doesn't work out that way cuz compilers are just more complicated than most people expect. In theory, theory = practise. In practise, not.

  93. Re:No. Hand coded assembler is faster. by GnrcMan · · Score: 2

    No. Hand coded assembler, coded by a clueful programmer, will always be better than compiler generated code.

    You are wrong.

    With modern processors, a decent compiler can do better general optimizations than you or any other assembly programmer I've ever heard of. While there is that (less than) 1% of the time when you can hand optimize a small piece of code, that's the exception rather than the rule. The rest of the time you are better off letting the compiler do it's job.

    --GnrcMan--

  94. I'm from Kansas by donglekey · · Score: 1

    hey, I'm in topeka, the capital of Kansas, and while yes I don't like Kansas, just because there are idiots who by some miracle got responsibility, it doesn't mean that there aren't smart people in Kansas. Ever heard of newtek or Quvis? http://www.newtek.com http://www.quvis.com

  95. Re:Compilers dont write better code than humans by jekk · · Score: 1
    I'd like to respond to your Letter A:

    > In addition, I simply don't believe it's impossible to
    > a) Tell the compiler in more detail how to optimise your code

    I've always wondered why there aren't programming languages out there that provide direct support for this. There are lots of sorts of optimization hints that we programmers ought to be able to provide to the compiler... knowing that not all compilers will be smart enough to take advantage of all of the hints. But I've never seen a programming language that supported compiler hints in the language level. Am I simply ignorant, or is there a reason why people don't build this into languages?

    -- Michael Chermside

  96. Re:Compilers dont write better code than humans by Brian+Feldman · · Score: 1
    The difference is that you got it wrong. Read what you wrote earlier.

    --

    --
    Brian Fundakowski Feldman
  97. Apples & Oranges was, Re:Not so obvious by Tassach · · Score: 1
    JAVA version = program code + VM + JIT
    Asm version = program code

    You can't compare Java to Assembler. They are different languages and have (or at least should have) different purposes. While it is possible to write highly efficient Assembler code, this comes at a high price in terms of programmer time & expertise.

    As an experiment, try writing a fairly simple program (a basic implementation of the *nix sort command, for example) from scratch in both Java and Assembly. Personally, I could probably have the Java version finished in an hour or two, while the Assembly version would take me (at least) a whole work day. (I'll admit that my assembly is a bit rusty) Yes, the Assembly version would be smaller, faster, and more elegant, maybe even noticably so on large files; but the Java verson would be done and working in a fraction of the time.

    Back in the good old days, programmer time was far cheaper than CPU time -- an extra couple of days worth of coding to shave off a few K or a few CPU cycles was a cost-effective use of your resources; now, however, the reverse is true. For most modern applications, the small increase in performance you could potentially get from writing in Assembly just isn't worth it in terms of time and aggrivation.

    Different computer languages have their own strengths and weaknesses. I would not willingly use Assembly to write a GUI front-end for a database application, any more than I would use Java to do somthing like signal processing on an imbedded microcontroller.

    --
    Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
  98. Re:Compilers dont write better code than humans by GnrcMan · · Score: 1

    1. Why the hell would you have code like that? It sounds like trying to reuse a variable for two different purposes. Declare two different variables and the compiler will sort it out.
    2. Why would you want to use the upper byte for a different value? Unless your working on a weenie chip with only four registers....oh.
    3. Byte operations suck on all modern processors I know of...including Pentiums. Congradulations, you've just un-optimized your code.
    4. Modern compilers can perform the optimizations you describe, If the optimization is beneficial...which is dubious in this case.

    --GnrcMan--

  99. Re:Profoundly counterintuitive? by Anonymous Coward · · Score: 0

    I do not intend to insult the authors of FFTW, but their FFT benchmarks are severely flawed and biased in favor of their own algorithm. I know (based on personal experience) that IMSL and the SGI HPC libraries are significantly faster than FFTW across a broad range of parameters. Portable C libraries exist (eg. Bernstein's FFT) that are faster than FFTW by a significant margin.

  100. Re:Compilers dont write better code than humans by AndrewHowe · · Score: 1

    Heh, well maybe not so violent :)
    But, the devil is in the details :) And as I thought of my reply I remembered the old Emo Phillips gag, which I trawled the 'net for, and here it is...
    (Score: -1 for Off Topic, +10 for Funny)
    -----------------
    I was walking across a bridge one day, and I saw a man standing on the
    edge, about to jump off. So, I ran over and said, "Stop! Don't do it!"
    "Why shouldn't I?" he said.
    I said, "Well, there's so much to live for!"
    He said, "Like what?"
    I said, "Well...are you Religious or Atheist?"
    He said, "Religious."
    I said, "Me too! Are you Christian or Buddhist?"
    He said, "Christian."
    I said, "Me too! Are you Catholic or Protestant?"
    He said, "Protestant."
    I said, "Me too! Are you Episcopalian or Baptist?"
    He said, "Baptist!"
    I said, "Wow! Me too! Are you Baptist Church of God, or Baptist Church
    of the Lord?"
    He said, "Baptist Church of God!"
    I said, "Me too! Are you Original Baptist Church of God, or are you
    Reformed Baptist Church of God?"
    He said, "Reformed Baptist Church of God!"
    I said, "Me too! Are you Reformed Baptist Church of God, Reformation of
    1879, or Reformed Baptist Church of God, Reformation of 1915?"
    He said, "Reformed Baptist Church of God, Reformation of 1915!"
    I said, "Die, Heretic Scum...", and pushed him off.

  101. Re:Self-Modifying Code by skybrian · · Score: 1

    > if your JIT VM forces periodic cache writes to
    > RAM (Not every time but enough times to ensure
    > some sort of coherence if the cache...

    Actually, a little-known fact (surprise to me anyway) about the Java specification is that it
    only guarantees a cache sync when starting or finishing a synchronized block, or when reading or writing a variable marked as volatile. (See Doug Lea's book.) Bugs like this are nearly impossible to test for. I wonder how much Java code will break on multi-CPU machines?

  102. Re:Compiler will be faster than JIT by Anonymous Coward · · Score: 0
    Dude - That is not 'optimizing', that is designing. When you switch from using a bubble sort to quicksort, people don't say "Wow, what a great optimiziation." That is picking out the right algorthim for the task. Now, picking the right algorthim isn't always easy (like knowing not to use bubble sort) - but that is a distinct task from 'optimizing'. Of course the computer isn't going to do that part for you. But the question is given you use a FFT, can you pluck down the optimized asm that will beat the compiler's asm for the high level language implementation?

    Slothmonster

  103. Re:Profoundly counterintuitive? by PeterBecker · · Score: 1
    I don't believe an experienced assembly coder can beat or even equal an optimizing compiler in the long run. This has different reasons:
    1. He has no time to do it
    2. He is not able to keep track of his own optimizations -- at least not for systems of interesting size
    3. He will make errors. To say it short: A compiler that makes errors is buggy, a programmer that makes errors is human. I can fix the compiler but not the programmer.
    4. The compiler is build upon knowledge from many good programmers. This knowledge can increase for a long time. The programmer will reach his bounds before the compiler will. And he will eventually forgot things.
    Don't misunderstand me: if a profiler tells you that there is a function consuming 10% of your CPU time and you are experienced in assembler: go for it. Compiler are not that good today. But they will be - more sooner than later. And: optimizing code is not really the creative problem. Designing code is -- if optimizing means redesigning for you, you should think on your designs earlier and maybe on a more abstract level. But maybe I'm just blinded by OOA/D/P and you're right. Time will tell. Peter
    --
    -- CAUTION: Don't read this posting.
  104. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

    For example, in x86 assembler it is possible to save some cycles (and memory accesses) by using 16-bit or even 8-bit registers instead of 32 bit registers.

    Plain wrong. (Not only) in this special case you have to consider the native register width of the cpu, the width of different busses, first level, second level cache strategies, main memory access strategies and architectures.

    For example, every capable os on x86 is nowadays using code segments which default to operations being 32 bit wide. Within the same opcodes it is also possible to switch to 8 bit wide operations. But if you use 16 bit wide operations, you'll have to add a prefix byte every opcode telling the cpu to use the 16 bit variant. Depending on the whole scoup of caches and memory subsystem this can actually slow things down.

    At this place we don't have even considered different cycle times for the various x86 derivatives: every x86 cpu, whether it's an Dementium(tm) or some other cpu, executes the same instruction using a different number of cycles.

    Conclusion: while in some cases humans can actually write better code for one cpu and mainbord, we're better off letting the system itself collect appropriate statistical data and use that to optimize on the fly step-by-step.

    As a side note: I remember hearing FX!32 also keeping some information from the JIT compilation and storing it on disk for the next time the application is run. Anyone knows whether this is still used and to what success?

    Harry Deer

  105. A counterexample by David+Roundy · · Score: 1
    BTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?

    There's a wonderful counterexample to your arguments, which is FFTW (the fastest fourier transform in the west). This is a code that dynamically generates a highly optimized algorithm for doing FFTs. It is written in 100% C, and outperforms (not all the time, but often) hand coded highly optimized FFT routines.

    It does this by actually running timing tests, and using the results to optimize the algorithm for your computer, for the size of array you want transformed. It is quite amazing. Here are the benchmarking results from a RS6000 of some sort, comparing (among others) FFTW with ESSL, the IBM hand optimized FFT library.

    This is sort of a strange example, in between JIT compilers and normal compilers, and is a case where the compiler actually knows a heck of a lot more than the programmer does. Anyhow, figured I'd point out a cool code that is able to optimize itself.

  106. Re:Not so obvious by Anonymous Coward · · Score: 0
    I was simply giving the obvious answer that:

    JAVA version = program code + VM + JIT
    Asm version = program code

    The obvious wrong answer, because it assumes "program code" as a constant. A more accurate version:

    JAVA = program code version A + JIT overhead
    ASM = program code version B

    Without any knowledge of how program code versions A & B compare, we can't confidently say that ASM is better.

  107. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

    0xFF == 255 ...

  108. Re:Doesn�t work in practice by cje · · Score: 2

    Its wrong to have the most basic data type wobble over the bit length.

    No. This simply wrong, and almost humorously so. What are you going to do? In the C standard, are you going to say "Implementations shall make int 32 bits wide?" What do you do on a machine (say, an embedded system) with 9-bit bytes? What about a Cray YMP? The goal of the C standard is to have the implementor pick representations for the basic types that make sense for the target machine. Performance is nice, but in the end, they have to make sense. Now, there are some limits in the form of minimal ranges that are placed on the "basic" integral types, but that's about it.

    The solution would have been to have the basic data types with fixed size, and additional data types with "some size between 16 and 64 bit giving the optimal performance on the target machine".

    You should be more than pleased with the next revision of the C language standard, C99. Among other things, it standardizes such types as int16_t and int32_t, which provide exactly the functionality that you seem to think is important to portability.

    This would have saved many people a lot of grief...

    What would have saved many people a lot of grief is if they would have learned C correctly in the first place. They would then know what it guarantees and what it does not, and how they can use the language facilities (i.e., typedef to satisfy their non-portable desires.) The amount of junk I see in code written even today is staggering! (assumption that short is 16 bits, long is 32 bits, etc .. casting the return value of malloc(), etc ..)

    --
    We're going down, in a spiral to the ground
  109. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

    >The metric that I given in my university computer >architecture class was that a compiler can
    > generate better code than 90% of assembly >programmers.

    Did you get any more, or just a soundbite from your prof?

    Does that mean, "90% of assembly programmers" in your class, in the industry, or in some anonymous survey?

    Do you wonder what the statistic would say if the "assembly programmers" were chosen from among those known to have a passion for optimization?

    Consider this: the person who can come up with the compiler optimization that makes a big difference is probably in the 1% of the programmers in your statistic. And do you think that if your 90% of the programmers out there were the ones writing the compilers, they would be so efficient?

  110. Sun's Windows Java2 JVM grows its heap slowly by billr · · Score: 1

    Write a Java program that allocates, say, a million 26 element arrays of char, or bytes or whatever primitive type makes you happy. Make sure to hold on to each reference.

    Run it a few times with the default memory settings and pay attention to the time. Also, watch it (in whatever tool your OS provides, task manager for NT) and try to gauge about how much memory it needs. Call this trial 1.

    Now, run it again, but set the starting heap size big enough so that it doesn't have to allocate much or any memory. Note the time. Call this trial 2.

    Now, write the corresponding program in C or C++. Allocate a bunch of arrays on the heap. In all fairness, you should calloc them because Java initialized array elements to 0 for you.

    Run it and note the times. The time differences between the c/c++ and trial 2 program should roughly equate to the raw performance of "internal" memory allocation. And the difference between trial 1 and the c/c++ version should roulghly account for the extra time it requires for the JVM to grow its own heap.

    I did this awhile ago, and I wound up with numbers like these.

    trial 1: ~7 sec.
    trial 2: ~3 sec.
    c/c++: ~2 sec.

    So the internal guts of that JVM on that day under those conditions put it about 50% slower than C/C++. But when that JVM on that day under those conditions had to deal with the operating system to grow its heap, it really started to stink. Note that the C/C++ program had to grow its heap to complete.

    --
    I've finally found the off by one erro
  111. Re:Sun license? by um...+Lucas · · Score: 1

    Crusoe CAN'T run java natively. It has it's own VLIW instruction set. On top of that sits the code morphing engine/JITC. So Transmeta can easily point to that as being proof they haven't built a processor that runs Java "natively".

    Intel is definetly the entity to worry about so far as Crusoe goes. Did transmeta pay lioensing fees for MMX, etc? I'm sure intel's scrambling right now trying to find how they can keep this chip off the market. They'll probably end up with some sort of patent licensing agreement in the end. After all, transmeta probably want SSE (is that it?) instructions eventually, and imagine if they could produce a module that just dropped into socket 370's? Intel would probably die for their software simply because that would be much more efficient than either developing their own code morphing stuff or building x86 translation hardware into merced.

  112. Re:Compilers dont write better code than humans by AndrewHowe · · Score: 1

    Who the hell moderated my post to "Troll"? We just had a really good discussion on this stuff. What part of "Troll" do you not understand? Maybe it was a slip of the mouse...

  113. just plain wrong (me this time) by TheDullBlade · · Score: 2

    Hmm... That bites.

    If you ask me, it is a design flaw. Longs are only required to have the range of a 32-bit number, so it is wasteful to include more bits. A lot of portable code will have unnecessarily bloated memory use and any reliance on the extra 32 bits throws portability out the window. Given that, a non-portable extension for 64-bit ints (since full use of 64-bits is inherently non-portable; with a lot of sizeof checking some use is possible, but not convenient) to be used only for system specific optimizations would probably be a better choice.

    Given that most code run on Alphas probably isn't written with a 64-bit long in mind, it almost certainly hurts more than it helps (especially with memory access being the main bottleneck of modern systems).

    The next C standard should fix this problem by providing a 64-bit int (if you ask me, they should just allow you to stack longs, doubling the bits of the minimal range each time; writing arbitrary bit size emulation into the compiler is trivial, though the trivial implementation is inefficient, and it would be a permanent fix that could have surprising benefits for certain supercomputer applications), as well as the addition of fixed bit size types (which is, I suppose, good for networking code, though I fear people will abuse it; IMHO it's better the way it is where you should chop everything up manually into whatever format the protocol takes).

    --
    /.
    1. Re:just plain wrong (me this time) by Scott+Wood · · Score: 1
      Hmm... That bites.

      No, it doesn't. There would be far more breakage with a 32-bit long on alpha.

      Longs are only required to have the range of a 32-bit number, so it is wasteful to include more bits.

      1. IIRC, C requires long to be large enough to hold a pointer. On non-NT alpha, that's 64 bits.
      2. 64 bit operations on an alpha are no slower than 32 bit operations, so it's not really being wasteful.
      3. Why use a 64-bit chip if you're only ever going to use half of it?

      which is, I suppose, good for networking code, though I fear people will abuse it; IMHO it's better the way it is where you should chop everything up manually into whatever format the protocol takes

      Some people abuse virtually all aspects of the language as it is; I don't think that should be the only thing keeping new features out. As for protocols, with types of a guaranteed size, you can use the struct as the protocol itself (as long as you put everything in network byte order, of course... and htonX functions for the fixed types would help.) It also helps for accessing structures on disk, talking to devices, etc.

      --

    2. Re:just plain wrong (me this time) by TheDullBlade · · Score: 2

      "64 bit operations on an alpha are no slower than 32 bit operations, so it's not really being wasteful."
      Think of the memory consumption (if it's not being wasteful, why then did you claim that in this case using long was sub-optimal?).

      "IIRC, C requires long to be large enough to hold a pointer. On non-NT alpha, that's 64 bits."
      IIRC, that's an old K&R thing and in ANSI C, pointers and ints are completely different animals and can never be inter-converted. Or maybe that's just the upcoming new ANSI standard.

      "Why use a 64-bit chip if you're only ever going to use half of it?"
      Indeed, why use a 64-bit chip at all? (I don't like the current trend of building bigger and bigger chips; we should be learning to make more and more of them work together) But more on topic, very little portable ANSI C code will make profitable use of a 64-bit long (yes, you can write code that checks just how big it is, and do, say, bit masking operations on more bits in parallel, or write a more efficient BigInt library). If you are going to write non-portable code, why even bother sticking to the standard? I like the "long long" solution that some compilers use; for one thing, it immediately tells any other compiler that this code isn't portable ANSI C.

      --
      /.
    3. Re:just plain wrong (me this time) by Scott+Wood · · Score: 1
      Think of the memory consumption (if it's not being wasteful, why then did you claim that in this case using long was sub-optimal?).

      I don't remember saying that, though it is a valid concern. Overall, though, I doubt the breakage that would result from sizeof(void*)>sizeof(long) would be outweighed by the memory savings.

      IIRC, that's an old K&R thing and in ANSI C, pointers and ints are completely different animals and can never be inter-converted. Or maybe that's just the upcoming new ANSI standard.

      I don't have the C spec here, and it's not in the C++ spec, so I'll take your word for it. However, without an integer type big enough to hold a pointer, how are you going to compare a pointer to another using operators like <? And there is a lot of code out there that does assume that sizeof(void*)<=sizeof(long).

      Indeed, why use a 64-bit chip at all?

      Because a 32-bit address space is too small? So that when you need to manipulate 64-bit data, you can do it efficiently? Why use 32-bit processors, then? Once upon a time, a 16-bit CPU was a high-end chip (in the PC world, at least).

      But more on topic, very little portable ANSI C code will make profitable use of a 64-bit long

      And how much code out there is portable ANSI C?

      If you are going to write non-portable code, why even bother sticking to the standard?

      So that the code can run on 99.9999% of the machines someone may want to run it on, instead of, say, 85%, or 50%, or 1%, depending on how far you deviate from the standard and what assumptions you make?

      --

    4. Re:just plain wrong (me this time) by TheDullBlade · · Score: 2

      "I don't remember saying that"
      Sorry, you didn't, that was AndrewHowe. I wasn't keeping track of who is on this thread.

      "I'll take your word for it"
      Don't take my word too seriously, I don't have the C spec either. I'm going on what other people have said about the standard. And like I said, it may have been about the upcoming standard, not the present one (you've gotta love it, it's finally been around long enough for people to comply and then they go and change the thing).

      "However, without an integer type big enough to hold a pointer, how are you going to compare a pointer to another using operators like do wish that methods of handling larger ints were specified; I mean, even the x86s have special features to handle 64-bit ints, which you just can't access using ANSI C). Actually, I'm rather fond of the idea of 24-bit chips. Chuck Moore thinks 20-bits is enough, and I really respect him, but he also thinks it's an acceptable tradeoff to use a character set where ones and lower case Ls are the same (I really recommend reading around his web site, while he might be a little over the edge, he still has a lot to teach the rest of us).

      The fewer excess bits you use, the smaller and faster (not to mention cheaper) the chip can be.

      "So that the code can run on 99.9999% of the machines someone may want to run it on, instead of, say, 85%, or 50%, or 1%, depending on how far you deviate from the standard and what assumptions you make?"

      Nothing that depends on a 64-bit long (and very little that can benefit from it) is going to run on 99.9999% of machines (it will, in fact, be broken on many 64-bit machine compilers), which was my point.

      --
      /.
    5. Re:just plain wrong (me this time) by Scott+Wood · · Score: 1
      Nothing that depends on a 64-bit long (and very little that can benefit from it) is going to run on 99.9999% of machines

      No, but code that assumes sizeof(long)>=sizeof(void*) will run on the vast majority of computers in common use.

      --

  114. Assembly = slow, java = fast, me = santa claus by Anonymous Coward · · Score: 0

    Yeah, assembly is really slow
    Java is really fast
    That's the reason why we use java in the kernel, not assembly
    That's also the reason why assembly is the fatest code and so on...
    I can't believe in such article
    Even the best compiler in this world can't generate code as good as handcrafted assembly by a human (yeah, I know, this may depend on human skill)
    Of course is too much work to do to write a whole program in assembly language... but ain't every day I have to completely rewrite from scratch my kernel, IDE drivers, Sound Blaster Stuff...
    Tell these guys to make an DEMO like that from Hornet Underground in a high-level language

  115. Re:Comparing JIT Misses The Main Point ! by PhiRatE · · Score: 2

    Somebody moderate this guy UP. He's hit the nail on the head with a huge sledgehammer, this is _exactly_ why code morphing is different, and far more spectacular, than the JIT concept.

    Although I still want access to the morph layer to play :)

    --
    You can't win a fight.
  116. Java Numbers and the Cost of JIT Compilation by lgas · · Score: 1

    I have two items to contribute.

    First is a reply to various people's assertation that whether or not JIT can speed things up, you pay an initial cost for it:

    This is a technically true statement, however it is misleading. There is obviously a cost for compiling code at runtime, however the question that should be asked is does this cost outway the benefit of the optimization that it provides?

    If you start with the assumption that the JIT can provide code, that _once_it_is_compiled_ runs faster than that which an average human coder writes, then it is simple to realize that in some situations, the JIT compilation will kill you, and in others it will

    The first case would be if you were to write a program like the unix command 'date' in Java. The time it takes the compiler to compile this code is always (I'm guessing) going to be slower than it is to just yank the date out of the hardware and spew it to the screen. Plus after the Java program compiles, it still has to make calls to native code to yank the date out of the hardware and spew it to the screen, so it's doomed to be slower than just about anything but the worst handcoded assembler or even C.

    However, the second case is that you are writing a major application; perhaps a web application. The application starts up, compiles all the classes, you lose maybe 2 or 3 seconds to this process if you're running a huge amount of code on a crappy processor. However, the application runs for 200 days, constantly handling mortgage applications, or doing text indexing, or selling t-shirts or whatever. During this time, the code that took 2-3 seconds to optimize is executed millions and millions of times. It's obvious that you would benefit greatly from a JIT compiler in this case (assuming you accept the base assumption that the JIT can created better code).

    So, IMHO, there's not much more to say about whether the JIT causes more of a performance hit than it provides so I think everyone can safely go back to discussing whether or not it can really outcode a human. I'll save my two cents on that for another day.

    In the meantime the second thing I'd like to add is a set of numbers I've come up with while working on a paper about whether or not Java is a viable platform for systems with high performance requirements. I plan to release the paper soon and when I do I'll try to post about it here, or you can contact me by e-mail if you're specifically interested. In the meantime here are some example numbers I've seen trying to compare Java, C++ and C performance.

    Test 1 - Integer Divisions
    Average time in C, w/ -O3: 00:00.76
    Average time in 1.2 JVM with JIT: 00.02.30
    Average time in C: 00:08.18
    Average time in 1.1 JVM with no JIT: 00:50.97

    Test 1 - Integer Divisions on a different (slower) machine w/ C++:
    Average time in C: 00:01.2x
    Average time in C++ (no optimizations): 00:15.3x
    Average time in C++ (w/ optimizations): 00:15.3x
    Average time in Kaffe 1.1 VM w/JIT: 00:18.7x
    Average time in Sun Java 2 JVM w/ JIT: 00:18.7x

    Test 2 - Integer Divisions in Multiple Threads
    Average time in C w/ -O3: 00:07.52
    Average time in 1.2 JVM with JIT: 00.18.74
    Average time in C: 01:13.04
    Average time in 1.1 JVM w/ no JIT: 08:30.00

    I'll leave it to you to deduce what you like from these numbers, but the first thing that surprised me was that the optimized Java code consistantly stomps all over the unoptimized C code. After thinking about it for a while this makes sense, but how often do people remember to throw in the -O3? Sure, on commercial software that's going to be burned on CD and shipped to millions of people, they probably remember... but in a Makefile in an Open Source project? Or even in smaller commercial shops? How many average C programmers are even aware that it can make a 10x difference in speed? If you are using a JVM with JIT compiling, you will always be running optimized. Another interesting thing is that I do not have a machine I can run Sun's HotSpot compiler on to compare it, which I will be doing prior to finishing the paper. Also at the rate things are accelerating it seems obvious to me that Java performance will catch and probably surpass C in relatively short order. It's just about already overtaken C++, which people said was 10 years away just a year ago.

  117. Re:Experience from Optimizing Java ... by Brian+Feldman · · Score: 1
    The only real difference between on-stack and on-heap allocation is that on-heap can be faster due to not having to call code to "gather" that memory. Heap and stack are both the same COW, zero-filled-on-creation, VM objects, with very little difference in semantics (i.e. growing up or down on page fault).
    If you tend to use "malloc()/free()/new/delete" a lot, you'll pull in more heap, causing page faults, and freeing it later, causing it to possibly be marked "unused" by the system and waiting to be page faulted again later. If you use the stack a lot, you cause the exact same kind of faulting and page reclamation (if marking stack unused).
    What's the big difference? Well, with dynamic memory allocation (a misnomer since "static allocation" is dynamic in much the same way) with malloc/free/etc., you run special code which allocates the memory, handles the alignment of it, etc. With on-heap allocation, you just force that work to be repeated (albeit in a smaller manner) in assembly as part of the function code. The compiler itself generates that, but it's still the same thing: managing your memory allocation with code.
    The difference is just in it being explicit or implicit: is it done by me, or the compiler? In any case, I'll wager that there will be absolutely no measurable difference in speed between something that uses a _good_ memory allocator to allocate everything on the heap, and a _good_ compiler while allocating things on the stack. Feel free to prove me wrong, since I'm easily swayed by that :) Note that I don't expect that I'm wrong ;)

    --

    --
    Brian Fundakowski Feldman
  118. Brute force and.. by xmedh02 · · Score: 1

    Yes, you are right, some chess programs do use a kind of pre-selection.. From what I remember from the times I have played chess profesionally (~10 years ago), the chess software works (or used to work those ~ten years ago) this way: it computes all possible positions generated by its move(s) up to some level. Then it evaluates all the positions with some "static function", i.e. counts it's material, centralized King in an ending etc., so each position gets it's ranking. Then the best one is chosen. Now, there are two schools: "pre-selection" and "brutte force". Pre-selection school generates the rankings only few moves ahead, and just trashes positions which look completely silly (i.e. have too low ranking), and only pursues further the promising candidates. The brute force school evaluates the full tree, sometimes getting a really great move. Now, entry-level chess programs use brute force, better programs use the pre-selection method, but the best ones brutte force. Or this is what I learned those 10 years ago, and this is oversimplified, you have opening libraries, endings specialities etc. I wonder what does this add to our discussion of brute force in optimization. :-)

  119. Sun license? by PapaZit · · Score: 3

    Doesn't Sun have a patent on hardware that can run java bytecode natively?

    I wonder where transmeta-ish devices fall. I'd call it software, but would sun's lawyers?

    --
    Forward, retransmit, or republish anything I say here. Just don't misquote me.
    1. Re:Sun license? by buckaroo-b · · Score: 1

      if this is the case it would be counter productive for sun. they have been banking on (w/ java) and continue to bank on (majik processor) the belief that JIT will not only be a reality, but become the norm in the next 2-3 years. even their corparate dopeheads must know by now that unless your MS you can't expect a technology to grow if you keep it to yourself. JIT compiling sounds great in theory, but if it can live up to that why don't we see a lot more of in use yet? i'm not jumping on the bandwagon 'til it's at least got wheels underneath it....

      --

      i have walked down train tracks, walked down train tracks, drunk at 3 a.m. it not magic, it's no great trick, w
  120. Re:Doesn�t work in practice by mikpos · · Score: 1

    You're right, and this was discussed, mostly because of me :(, on comp.lang.c some weeks ago.

    There is absolutely nothing you gain from an int16_t (signed int, 16-bits long) over a short (signed int, able to contain values from -32767 to +32767 (usually at least 16 bits)). If I can do it with an int16_t, I can do it with a short and do it much more efficiently.

    The place where people start crying is when you're trying to send stuff over the network or via files. int16_t again buys you nothing over short. You have to chop it up into chars (or 'bytes' for you non-C freaks) and endianise it first. In the end you're stuck with an unsigned char * (or a void *) which you're going to write to the FILE *, so int16_t doesn't buy you anything. This problem gets particularly sticky when you're dealing with platforms with chars ('bytes') of 9, 15, 18, 32, or 5.3e59 bits in size; in fact, I don't think there's any requirement in the C standard that the values have to be stored as bits at all. Mandating that an 'int' has to be exactly 32 bits long when the size of a byte is 64 bits on a platform is just retarded.

    Something that would be nice is a standard class for C++ which would do that for you. You do 'cout some_integer_16' and it automatically endianises and chops it up for you.

  121. Re:Compilers dont write better code than humans by prizog · · Score: 1

    I have seen a compiler use the16 and 8 bit registers when it knew the ranges because of some previous code. It was really neat to see how my C code became much faster ASM.


    But I still had to rip out the routine and re-write it by hand to use MMX.

  122. Compilers dont write better code than humans by tjansen · · Score: 5
    I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false. At least until compilers are able to understand the program. In every program there are things the compiler simply doesnt know. For example, in x86 assembler it is possible to save some cycles (and memory accesses) by using 16-bit or even 8-bit registers instead of 32 bit registers. You can double the number of registers by doing it. You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant. The compiler might profile the possible value range if it is a really advanced compiler (I never heard of any compiler actually doing this), but it cannot be sure so it must at least check the values before for the case that they arent.

    As long as the programmer has more knowledge than the compiler, he will always find tricks to save an instruction here or there and outperform the compilers this way. You can find a great example of such programming tricks in the PCGPEs article about texture mapping inner loops here.

    1. Re:Compilers dont write better code than humans by Hard_Code · · Score: 3

      Right. I've always thought an ounce of design is worth a pound of optimization ;)

      Leave the design to the humans...leave the optimization to the compilers.

      Jazilla.org - the Java Mozilla

      --

      It's 10 PM. Do you know if you're un-American?
    2. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      That is sort of what I was thinking about. Though it would also be handy to modify not just the assembly code, but also somewhere farther up the food chain. Modify the loop-setup assumptions, etc. and look how that changes things. or change an instruction and let the compiler suggest a rescheduling for the rest. Basically, do the same things you do when coding asm, but let the compiler figure out the details.

      Yes it could get very boring, but you'd only do it for important functions, like asm (about as boring anyway). And it'd allow you to handle bigger functions.

    3. Re:Compilers dont write better code than humans by GnrcMan · · Score: 2

      Incidentally, the optimizer hint you describe is called "__assume" in Visual C, and it does exactly as you say. So your code would be:

      switch (x) {
      case 0: ...
      case 1: ...
      ...
      case 23: ...
      default: __assume(0); // Same as 'notreached'!
      }

      normally you'd have a define like:
      # define ASSERT(e) ( ((e) || assert(__FILE__, __LINE__) )
      #else
      # define ASSERT(e) ( __assume(e) )
      #endif

      in order to hide this, but you get the point.

      Which brings to mind an optimizer bug that I came across (and fixed) which had to do with a nested turnary operator with two assumes:

      __assume(foo() && x==something),bar(&x) ? dosomething() : x == foo ? somethingwierd() : __assume(0);

      That's complete nonsense of course but it had the same form. This expanded out to a huge affair in the intermediate code the optimizer uses (one of those embedded functions was inlined(!!)). It was fun. Oh well, I'm sure no one is interested.





      --GnrcMan--

    4. Re:Compilers dont write better code than humans by BinxBolling · · Score: 3
      You need to understand the code fragment you are coding to do the must useful optimizations which are algorithmic ones. A badly coded better variant of an algorithm will beat well coded, but inherenetly slower version.

      Absolutely. But one need not be programming in assembly to do the sort of algorithmic optimizations you speak of. In fact, in higher-level languages, where source code content is determined largely by the algorithm being implemented (rather than by the low-level details of the implementation, as in assembly language), you have a better chance of 'seeing' and successfully implementing a faster algorithm. This sort of thing is one of the big reasons why assembler loses in the long run.

      It seems to me that there are two basic types of optimization task:

      • Choosing the fastest algorithm for a particular task.
      • Making a given algorithm run as fast as possible on a given piece of hardware.

      Of course, this is a bit of an oversimplification, since it assumes that the fastest algorithm for a given task is going to be the same on all platforms. Usually, this is true, but probably there are a few minor cases where it isn't. But it works most of the time.

      At any rate, humans are much better than machines at the first type of optimization. But machines are much better than humans at the second (even if only because normal humans don't have the patience required to do the enormous amount of repetitive analysis that it requires).

    5. Re:Compilers dont write better code than humans by interiot · · Score: 1
      My wording was a bit weird...

      Say you have this in code: "const int NONDESCRIPT = 255; a = b % NONDESCRIPT;"

      But you don't care that NONDESCRIPT is exactly 255, just somewhere around there. So if the compiler knew this, it could change NONDESCRIPT to 256, making parts of the program much faster.

    6. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0
      As the author of that texture mapping article, let me just say that I believe that 90% of the interesting optimizations you can express in assembler COULD be exposed to the compiler using an assertion mechanism--the compiler verifies the assertion in debug code, and optimizes on the assumption it's true in the optimized code.

      For example, given a switch() statement, C compilers which generate a jump table have to range check the index to make sure the index doesn't go out of the jump table, since that would cause the program to jump into random memory. With a built-in assert and a compiler designed to optimize for it, I could say something like (forgive my indentation, but slashdot doesn't support <pre> apparently, and geez, and I have to type &lt; instead of <, how does anybody type code, I thought this was supposed to be good for geeks, harumph, geez, AND If you preview it it removes the & encodings... well, ok, enough whining):

      assert x >= 0 && x < 24;
      switch (x) {
      case 0: ...
      case 1: ...
      ...
      case 23: ...
      }

      or even better

      switch (x) {
      case 0: ...
      case 1: ...
      ...
      case 23: ...
      default: assert 0; // Same as 'notreached'!
      }

      It's also possible to manually add instrumentation for the compiler about what trade-offs you'd like it to make. I saw a discussion of this in comp.compilers many years ago, and my recollection was that it was something people experimented with with FORTRAN compilers and found that people wrote wrong hints and worse yet hints stuck around in code that had changed and for which the hints were no longer valid.

      On the other hand, for those of us who DO have to go to assembler occasionally (well, ok, I personally don't anymore due to 3d hardware accelerators), maintaining compiler hints and asserts would be way more pleasant than writing assembly code.

      Sean Barrett, Looking Glass Studios
      buzzard@world.std.com

    7. Re:Compilers dont write better code than humans by Zagadka · · Score: 1

      I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false. At least until compilers are able to understand the program. In every program there are things the compiler simply doesnt know. For example, in x86 assembler it is possible to save some cycles (and memory accesses) by using 16-bit or even 8-bit registers instead of 32 bit registers. You can double the number of registers by doing it. You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant. The compiler might profile the possible value range if it is a really advanced compiler (I never heard of any compiler actually doing this), but it cannot be sure so it must at least check the values before for the case that they arent.

      I agree with what you're saying to an extent. Compilers do generate worse code than a good programmer in some cases, because the compiler doesn't necessarily know which corners are safe to cut. You picked a very poor example though. If I know my value will always be between 0 and 255, or 0 and 65535, I'll pick an appropriate data type. Then the compiler knows what the range will be, and can choose appropriate registers. Some languages (Pascal, and Ada I believe) even allow you to specify a range of numbers as a type.

      Some high-level languages often don't let you describe problem solutions in certain domains very well. For example, the array summing thing that was brought up yesterday. In C, you'd typically use a for-loop. That implies doing things sequentially, even though it doesn't matter. It takes a fairly sophisticated compiler to realize that the order of the additions isn't important, and so vector addition instructions could be used.

      The problem is that we're telling the compiler a lot of stuff that isn't really important, and it has to figure out which parts are safe to ignore. This is why specialized high-level languages tend to do so well in their domain. You tell them only what's important, which is easier on the you, and they can generate highly optimal code because they don't have to second-guess what you said.

    8. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      Are you high or something? You do realize that the Crusoe is NOT an x86 processor - its a VLIW processor. The architecture is closer to the intel Merced (or whatever it is now), which ALSO requires extensive processing at the compiler side. These aren't the traditional superscalar architectures like the pentium and its peers. The VLIW chips have simpler on-chip logic, eliminating the complex and sometimes slow logic found in superscalar cpus that extract instruction level parallelism out. The VLIW architecture has explicit parallelism in each instructions - the merced for example uses 128bit instructions, each of which contains 4 32 bit instructions (hence the name, VLIW - very long instruction word). Sorry, but these new processors are very complex to program by hand - managing not only the registers and instructions, but making sure that the particular arrangement in the word will work well in relation to preceeding and following instructions is very very hard. It's no longer the 80386-style world of hardware with 8,16,32 bit registers, 8,16,32 bit instructions, etc... People finally are pulling their heads out of their rear ends - give it a try sometime and learn about the processor before commenting.

    9. Re:Compilers dont write better code than humans by dingbat_hp · · Score: 2

      16-bit or even 8-bit registers instead of 32 bit registers. [...] need to be sure that the values in the registers are below 65536 or 256 to use these tricks,

      Isn't that trivial old-tech for compiler optimisation ? If your type system is strong enough to know that it's an 8 bit operand, then you're only ever going to need the 8 bit register.

      You can argue that multiplying two int8 only really needs an int16 result if the initial values are big enough, which is fair comment in favour of human optimisation. OTOH, it's not a rocket science extension to the type system to allow an int4 type to be declared, even if it's always held in an >=8 bit register and the compiler can then safely optimise that to place the result back into an int8. That way you even improve the code quality, as there's no discrepancy between the valid and the apparently valid numeric ranges, as you get with the human optimisation and the non-obvious limiting of the initial int8 to be <0xF

      Anyone remember BCPL ? 8-)

    10. Re:Compilers dont write better code than humans by tjansen · · Score: 1
      Isn't this why we have short int, long int, and char data types - so we can tell the compiler these things?

      Even if your language supports short or char arithmetics (C and Java do not, they convert everything to integers first) and would use the registers that way, this is not all you can to to optimize the code. In the texture mapping inner loops of the article is a piece of code where 2 16-bit numbers are put into a 32-bit register, and then two 32-bit registers are ADDed to save an add. In this situation overflows of the 16-bit parts are critical, but an overflow of the lower 16-bits are not as bad as an overflow of the upper 16-bits. If the lower 16-bits overflow, you will only lose the precision of the least significant bit of the upper 16-bits. So when accuracy is not so important, you can ignore the overflow of the lower 16-bits and loose 1 bit of the arithmetic precision in the upper 16 bits. Try to explain this to your compiler...

      Of course, you can use assertions and similar things to tell your compiler more about your code, but then you agree that the compiler is worse than the human programmer, because it can only produce better code when the human takes a larger part of the compiler's job.

    11. Re:Compilers dont write better code than humans by AndrewHowe · · Score: 0

      This is a language issue, not a compiler issue.
      For example, C lets you declare a variable as a char, short, long etc.
      Many x86 compilers will do the optimization you described in this case.
      Yes, a certain class of optimizations comes from putting limits on the range of your variables. Unfortunately, this often leads to stupid and unacceptable limitations. It's much better to have a language where there isn't any inherent limit (or {32|64|...} bits is "enough"). If you can also tell the compiler what range of values are legal, it can optimize as necessary.
      Please note that using 8 bit registers is in many cases *slower* than using 32 bit registers on the Pentium Pro and above. Yes, you get more of them, but the performance penalty can be quite high.

    12. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      >The compiler might profile the possible value range if it is a really advanced compiler (I never heard of any compiler actually doing this), but it cannot be sure so it must at least check the values before for the case that they arent. We fall down to the topic where Tim Sweeney fails - what kind of language can be profiled in the way you describe. To my opinion, you (for sure) can profile Ada for variable sizes. Creators of El76 (Soviet version of Ada-like language for Elbrus computers) claimed that it is possible to take a program on El76, pass it through verifier and get fact that given range of inputs program will output given range of outputs (un)proven. And I think that most of programs written on functional languages can be proven for that things. But writing optimizer for complex language is harder than for simpler one, if we take an assumption that complexity of any given compiler is constant. ;)

    13. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      for shitty processor like x86 but what about massively parallel processor ? Do you think you know more about instruction ordering than the compiler knows ?

    14. Re:Compilers dont write better code than humans by Sir+Banana · · Score: 1

      I don't think that this is the case - of course humans can always write the same code that a compiler generates, given the time, and if you complile to asembler on some compilers, the code is quite easy to squeeze an extra couple of optmisations from.
      On the other hand what compilers have is time, to get the best optimization may take an algorithm a long time so no current compiler would every use that algorithm. I remember a friendly telling me of algorithms he had come across with a runtime measured in days. The perfomance benefit from running them isn't worth the time to run them, but there is the potential for compilers to be better than any normal human would have the time to write.

      The importance of compilers in producing fast code is increasing already. The x86 line of processors has a lot of silicon dedicated to branch prediction etc but from what I have read of the Merced line the processor relys on the compiler to order the instructions optimally and to include extra information in the op codes for branch usages. I certainly wouldn't want to have to start coding that myself.

      --
      -- "Outside of a dog, a book is a man's best friend. Inside of a dog, it's too dark to read."
    15. Re:Compilers dont write better code than humans by jkain · · Score: 1

      A lot of people have been saying this is a bad example because in such cases you should just choose an approriately wide type to store variables which you would like stored in smaller registers. Actually, I think this is a very good example of when a human programmer can outperform a compiler. It may be the case that a variable is declared to be of a certain type because durring most of its lifetime it can occupy the file range of that type, but a certain specific points durring execution it can be known to fit within at particiular smaller range, say 0 to 255. At this point on an x86 the upper byte of the register holding this value could be used for other purposes, until the full range of the register is required again. It may be possible to get the compiler to produce code like this with a cast, but I suspect you would end up with some ugly conversion code slowing you down. Joseph Kain

    16. Re:Compilers dont write better code than humans by Quaternion · · Score: 1

      I feel like the assumption is that compilers can generate better code than a human programmer, not that they do. It seemed to me like the submission was kind of a mission statement for compilers/JIT-programs, a vision for the future of which Crusoe is perhaps a new iteration.

      But clearly, a knowledgeable, good assembly coder can still beat a compiler some or most of the time.

      --

      "The horse leech's daughter is a closed system. Her quantum of wantum does not vary."

    17. Re:Compilers dont write better code than humans by killmeplease · · Score: 1

      I know people tha can write really fast assmebly, but they cannot write nearly as much as me and my C compiler, and I am not the greatest C coder.

      --
      - Kill Yourself, spare us all! -
    18. Re:Compilers dont write better code than humans by Junks+Jerzey · · Score: 2

      No.

    19. Re:Compilers dont write better code than humans by Mr.+Slippery · · Score: 4
      You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant.
      Isn't this why we have short int, long int, and char data types - so we can tell the compiler these things?
      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
    20. Re:Compilers dont write better code than humans by kulturkritik · · Score: 1


      There are compilers that have direct support for this, but they are compilers for higher level languages, not C. Or C++. If you raise the abstraction level, you give the compiler more opportunity to optimize low level details. Telling the compiler how to optimize ("hinting") instead becomes giving the compiler more leeway in translation of your highlevel code to assembly language. You just have to make fewer assumptions about how your program will look once compiled.

      Until we ditch C/C++, really intelligent compilers will not happen.

    21. Re:Compilers dont write better code than humans by kulturkritik · · Score: 1


      I think you mean: "the more abstract a language, the *more* win from compilation".

      Right?

    22. Re:Compilers dont write better code than humans by bogado · · Score: 1
      Well I guess you're probably rigth but the article does have some interesting points and I guess that at least your example is, in my opinion, wrong. After all I think that a compiler usualy knows the range of the data, at least in strong typed languages like C. Of course there are variations in plataforms but usualy there are aways a way to say that a certain variable is 8, 16 or 32 bits long. So all the compiler have to do is to use the correct type, and the compiler will use the correct register size.

      Well, also I understand your point and I am shure there are more then 10000 cases that a human should know his program better then the compiler, after all he is inteligent. But then again there is always ways to make the program easier to be optimized by the compiler.

      What I mean here is that, probably, the compilers could create programs that are much faster then they create today. And I am shure that there could be standard ways to hint the compiler what the programer wants with his code so that the compiler could create even more optimization.


      --
      "take the red pill and you stay in wonderland and I'll show you how deep the rabitt hole goes"

      --
      []'s Victor Bogado da Silva Lins

      ^[:wq

    23. Re:Compilers dont write better code than humans by nester · · Score: 1

      imho, more generic (ie, abstracted) code, while allowing the compiler more freedom, the code is less detailed, thereby giving the compiler less specifics. that is, a less detailed version of the program, which means the compiler has less of an idea of what you want to do. in other words, as you reduce the resolution of the code, you're making it less specific (more generic) and less optimized for the given task, and gives the compiler less hints about what you really want to do. the only advantage of even higher level languages is coding time. for simple programs, that's great, but for code where performance is more critical (and in the common case, i believe), it sucks.

    24. Re:Compilers dont write better code than humans by AndrewHowe · · Score: 4

      In some ways I agree with you, OK, I love writing assembler stuff.
      But I'm trying to separate fantasy from the reality here.
      The fantasy is that you're some super studly asm c0der who can not only produce properly scheduled code, but also has the time to *completely re-write* it:
      a) For each new processor
      b) For each sub-type of that processor with different internal resources
      c) Every time the specification changes - Maybe you have limited the range of a variable or something.
      Any of these changes will require a huge amount of work. Even a small change, coupled with the pipelined nature of modern processors, means a large modification to re-establish optimal resource usage.
      The reality is that no-one has the time to do this, and even if they do, they should be doing something else instead. High level optimizations almost always beat low level tweaking.
      In addition, I simply don't believe it's impossible to
      a) Tell the compiler in more detail how to optimise your code
      b) Have the compiler suggest ways in which it could produce better code (for example, "Is it OK to assume there is no aliasing of this item?" or "Hey, if I make this variable 8 bits in size I can do this huge optimization")
      c) Have the compiler work this stuff out for itself
      d) As (c) but at run time, guided by profiling information.
      Some of those are hard to do right now, but I don't see anything that would make them impossible.
      Does anyone?

    25. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      If you're certain that you're only going to use 8 or 16 bit numbers, why not declare your variables char or short int? Use smaller variables, and the compiler will then know.

    26. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      I've yet to find any code that any compiler could beat me on, and here's why. You folks assuming that a compiler can perform all the optimizations that a human can. But a *good* assembly programmer will see things a compiler just can't see. You need to understand the code fragment you are coding to do the must useful optimizations which are algorithmic ones. A badly coded better variant of an algorithm will beat well coded, but inherenetly slower version. A good programmer will see this and apply a new algorithm and optimizations, based on the performance of the existing code. There are something where today's compilers can get close to my ASM abilities. Things like shuffling large blocks of memory around, etc. This is mostly due to the fact I don't care to code this in ASM if I can help, and when I have to, you tend to get lazy (it's VERY boring code). Humans can do other things as well. The appliciation of MMX/3DNow!/SSE/Altivec, etc. These are all notoriously hard on compilers (witness the miserable compiler support for any of the SIMD instructions). Something I hear a lot of people talking about here is scheduling issues and knowing the clock cycles of various instructions. You folks have to be joking that a human can't do this. A good ASM programmer can take an entire subroutine and tell you where cycles are being wasted, what does and doesn't pair etc. It's *really* not all that hard, *if* you know what the hell you are doing. Part of me will be glad when a compiler can beat me, so I don't have to drop down to asm level to get the maximum performance, but that day isn't here yet, and I don't think it's from lack of trying on the part of the compiler people. Code generation is in many ways like speach recognition, simple for the human brain, difficult to implement with a fixed set of rules.

    27. Re:Compilers dont write better code than humans by Namarrgon · · Score: 3

      It's just like Deep Blue vs. Kasparov again. Brute force compilers vs. intuitive & creative humans. In some cases, humans can write better assembly than compilers (thanks to knowing more about what the program is intended to do), and in others the compilers have the advantage (thanks to being better at scheduling instructions on today's pipelined, superscalar CPUs). JIT compilers could have even more knowledge about instruction timing, but have less time to think about it. Kinda like Deep Blue playing blitz chess. I'd say in most cases these days, a compiler could write faster code from scratch than a human could, unless the human really spends some time on it, but the optimal combination will always be a human assisted by a compiler (and profiler, etc). Best of both worlds. Namarrgon

      --
      Why would anyone engrave "Elbereth"?
    28. Re:Compilers dont write better code than humans by GnrcMan · · Score: 4

      That may be true of a crappy compiler on a lame platform (x86)...but I'd like to see you try reordering and slot scheduling Alpha instructions by hand to enable optimal pipelining and branch prediction.
      As someone who worked firsthand on a compiler for the Alpha, I can tell you that for 99% of the cases, the compiler can perform better general optimizations than a human. That 1% left over is what you should be concentrating your efforts on.
      Historically, the point of a CISC processor was to reduce compiler complexity. So of course hand coding optimal X86 assembly isn't going to be difficult.

      --GnrcMan--

    29. Re:Compilers dont write better code than humans by ddunn · · Score: 1
      But that's beside the point: the fact remains that if we do spend the time to write the code by hand, we can always do a much better job than the compiler.

      This is only true for a subset of the programers that know the machine well enough. Current machines are forced to be easy for idiots to hand tune. Because they run old hand tuned code.

      Imagine if you were required to ensure that ALU0 was never reading a 2 cycles bypass from memory while ALU1 used the shifter.

      "What's a bypass?" Exactly.

      Now imagine that there are about 800 of these rules. And they change at every chip rev. It sure makes life easy for the HW guys, but do you really think you can consistently beat the compiler? Various academic machines have had these types of rules, but they never took off. Mainly because there are to many studs and not enough engineers.

    30. Re:Compilers dont write better code than humans by locust · · Score: 5
      I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false.

      The metric that I given in my university computer architecture class was that a compiler can generate better code than 90% of assembly programmers. Which, when I examine higher level code written by a variety of programmers, some good, some bad, makes sense. Some people know all the details of a language and some don't. Some pay attention and some don't. I'm pretty sure it was backed up by a reference to an IEEE paper somewhere.

      If large numbers humans could reliably write large volumes of high quality, optimized code, at high speed, we'd still be using assembly everywhere. But they can't. I certainly don't have the time or the inclination to learn all the ins and outs of every instruction set architecture I operate on to squeze every last bit of juice out of the machine. And I certainly am unwilling to give up the expresiveness of a highlevel compiled language.

      But ultimately, the fraction of people that need to write things like texture mapping loops is very small, and the fact of the matter is that the loop is probably embeded in some higher level piece of code. So relax, compilers reliably produce better code than the majority of assembly programmers (maybe not on /.). Or more precisely, better machine code, than the majority of programmers, if they were asked to write assembly. --locust

    31. Re:Compilers dont write better code than humans by interiot · · Score: 1

      Yes, "a % 256" is the same as "a & 255".

    32. Re:Compilers dont write better code than humans by Anonymous Coward · · Score: 0

      Somewhat off topic maybe:

      It is true that programmers know more about the code, while it is also true that the compiler can do the boring parts, and count cycles faster than a human can.

      What I'd like to see is come sort of tool that takes, say, C code, and compiles it with you, so you can tell it at compile time what assumptions to make. You coulde see the asm, the scheduling, the cycle totals, and try out various assumptions/modifications, and then the tool does the rest (reorders insn to fit your changes, moves registers around, etc). Some sort of interactive compiler if you will.

      It would make it easier to optimize code, and also to port it, since you have the c source anyway. And it'd be more efficient than simply compiled code.

      Oh well, I guess I have an idea for my next GNU project :-)

  123. Re:Why not? by Anonymous Coward · · Score: 0

    and, so what? If the performance spec is "do this in X seconds", and it's done below that time frame, then isn't that good enough?

    Now, is it possible to make these programs unlearn their optimizations, or are they optimiztically stubborn?

  124. Games and Java by cstaylor · · Score: 1
    This isn't very practical for games though. Typically we want full control over the memory. We know the upper limit of what we need, and we don't care much about other applications. So we can allocate a large chunk at start time and then work within that, to avoid the new/delete/malloc/free overhead. If removing "antiquated" things like pointers is the best solution, then we'd all be writing games in Java. OK well that's a stretch but you see my point...

    Memory allocation isn't your problem for game programming in Java... to paraphrase James Carville, "It's where the graphics are rendered, stupid" (no insult intended). You can do all of the memory allocation at the beginning of your application (I do it all the time with byte[] buffers for I/O), just like you would in C/C++. If the graphics rendering code is in Java, or uses software OpenGL renderers (like running Java3D in Windows NT), the performance sucks.

    I think people have seen Applet-based games in Java and think "this sucks". Those Applets most likely use the AWT or Swing for their graphics rendering, and you see poor performance.

    Just my 2-cents.

    -Chris

  125. Compiled Once vs. Compiled multiple times by JWRose · · Score: 2
    If I'm understanding what you're saying correctly,
    a JIT compiler will compile the exact same code each and every time you go to run it. While a convential compiler will compile it one time. Yes the JIT compiler might be able to tune the performance better for each run, but I think it takes a performance hit at the start by having to do the compilation. While the JIT compiler is compiling code, the exact same code compiled with the convential compiler is already executing.



    The faster the hardward becomes though, the differences between the two might not be as noticible. Although, the more powerfull the hardware becomes, the more complex the code becomes.

    Nothing exists exept atoms and empty space; everything else is opinion.

    --

    blah blah blah....
    1. Re:Compiled Once vs. Compiled multiple times by buckaroo-b · · Score: 1

      transmeta worked around this by caching frequently used code so that it does not need to be "morphed" over & over again, also it spends more time on frequently used code trying to optimize it so performance increases w/ repitition. pre-compiled code cannot do this, it only get one shot.

      this doesn't eliminate the performance hit compiling multiple times, but it significantly reduces so that the loss may be equalled, or even surpassed by the gains from running the optimized code.

      i know morphing != JIT coompliling, but there is no reason the same concept cannot be applied, unless, of course, they've got patented!.....

      no!, not the red button!!!

      --

      i have walked down train tracks, walked down train tracks, drunk at 3 a.m. it not magic, it's no great trick, w
    2. Re:Compiled Once vs. Compiled multiple times by John+Allsup · · Score: 1

      Nope -- the JIT compiler compiles each .class file into native machine code each time the .class is loaded afresh. Further, it may (at its option) recompile the methods of the .class using different parameters (found by profiling).
      John

      --
      John_Chalisque
    3. Re:Compiled Once vs. Compiled multiple times by MikeBabcock · · Score: 2

      Once you've implemented a caching interface _at all_, you abstract the maintenance of that cache and allow the maintenance system (thread, whatever) to deal with keeping MRU or LRU indexes, or whatever modern DB-based system you want to use for good caching efficiency. In fact, you could try several over time (in a second, how many could a 700Mhz machine try?) in a small section of your cache and evaluate efficiency of each, on average, for a given user -- that's the point of this Chip+Software deal anyway.

      Upgrade the caching maintenance code later if you're finding its not good enough ... just requires flashing the bios :-)

      --
      - Michael T. Babcock (Yes, I blog)
    4. Re:Compiled Once vs. Compiled multiple times by ucblockhead · · Score: 2

      If you kept the entire compiled code around (and not just part of it), you could keep profiling info, and only attempt further optimization the Nth time a block was run in a certain period, were N is some sufficiently high number. You could also be really smart and save the compiled code so that next time you execute, you don't need to recompile everything.

      Imagine a program that speeds up the more times you run it...

      Perhaps you could set some sort of "optimization threshold", which limits the percentage of time the system tries to improve over time. The first time you run, there is the complete overhead of compiling. The second and subsequent times, a 10% slowdown for optimization. After a few runs, with the data collected, the performance improvement beats the 10% overhead and you are better off.

      --
      The cake is a pie
    5. Re:Compiled Once vs. Compiled multiple times by MikeBabcock · · Score: 2
      The intelligent JIT compiler would do one of several things:
      • Cache the compiled code on the hard drive for each program executed.
      • Cache the compiled code for frequently seen sections (like fast symbol tables for headers in C compilers).
      --
      - Michael T. Babcock (Yes, I blog)
  126. Genuine Question: have you ever compared?? by orpheus · · Score: 2
    It's been over a decade since I did any appreciable assembly coding (I quit after the 80286), but I have a serious question, and I'd be interested in serious answers:


    How many of you actually routinely (e.g. 6 times a year) examine the output of your compiler, and compare it to what you would have written?


    I've done it often (a promise I made to myself as a foolish youth) and learned quite a bit. It's actually changed my higher-level coding technique on a few occassions (and I'm not just talking about optimizing for a better fit with a specific compiler)


    Not only have I never met another person who does it, but I've almost never met a programmer who didn't look at me oddly for even mentioning the practice. And yet, here are so many people walking as if they did it routinely. I guess that's just the Silicon Valley edge.


    I'm not being sarcastic or snide. I really want to know. Maybe I should give up my current profession (ditching a decade of grad and post-doc training) and move to the West Coast, where Programmers are Real Programmers.


    Or maybe I should finish reading my NEJM, and we should all just keep in mind what we really know, and what we only think we know. Again, no disrespect, Lord knows I've had to re-think the extent to which I "knew" things before.


    Please post your honest answer, and encourage your friends to do so as well. I'm afraid I'll have to take a lack of responses as a confirmation of my worst fears -- that everyone is only guessing.

    --

    If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime

    1. Re:Genuine Question: have you ever compared?? by Ninja+Programmer · · Score: 1

      > How many of you actually routinely (e.g. 6 times > a year) examine the output of your compiler, and > compare it to what you would have written? Well, speaking for myself I certainly do this. On the other hand I have already moved to the west coast. :o) You bring up a good point. People who have not seriously done this, let alone people like me who do this all the time, will have a hard time estimating the performance of a JIT or Crusoe let alone their compiler. So after inflating my ego, I have nowhere else to go but to present my authoritative opinion. :o) IMHO, Crusoe has its work cut out for it -- though they have a few aces up their sleeve, Java/JIT has dug itself a hole it may never climb out of and modern compilers still need work. -- Paul Hsieh http://www.pobox.com/~qed/

    2. Re:Genuine Question: have you ever compared?? by Xeger · · Score: 1
      When I'm writing time-critical code I use a different approach: first I write it in my high-level language of choice, then I gank the compiler's output and see if I can better it.



      But you know what? I find myself doing that less and less, since the amount of code I consider "time critical" gets smaller with every passing day. Furthermore, ever since those funky, dancing bunny-suited good-for-nothings introduced multiple pipelines, superscalar, out-of-order and speculative execution, things have gotten a whole lot harder for the poor assembly programmer.

  127. Compiler will be faster than JIT by Marco+Schramp · · Score: 1

    I completely agree on your first subject (compiler generate faster code than handmade assembly). There's hardly any dispute (except maybe for very esoteric applications that must use very dark corners of a processor).

    With the JIT vs. compiled language compilers I must disagree. The JIT-compiled language will never get faster than the chip will compile for. JIT compiler are generally in a hurry: it hasn't been compiled before and the user wants it executed NOW. Pre-compiled application have their compiler sitting a few minutes in silence to figure out the best instructions to use. Who will come up with the fastest code? Yep: the compiler.

    The memory allocation issue that you were thinking about is only a small portion of performance and optimal code. Stack or heap allocation/de-allocation algorithms are well understood problems. These issues have been solved before. Compiled vs JIT-compiled will make little difference.

    Marco Schramp

    1. Re:Compiler will be faster than JIT by John+Allsup · · Score: 1

      The why are the fastest numerical libraries the hand-coded ones? answer: The people doing the coding know things that the computer couldn't possibly. Current languages don's have the structure/semantics/etc. required to convey various mathematical know-how required to clever optimisations (how many compilers will recognise an operation similar to a discrete Fourier Transform and put an FFT in place -- you should note that large number multiplication usually uses such tricks)
      John

      --
      John_Chalisque
    2. Re:Compiler will be faster than JIT by John+Allsup · · Score: 1

      Dont be so sure. Compilers typically generate static (i.e. unchanging) code. At best this can be statically optimal for a given task.

      However it has been demonstrated (I think that some people at MIT have done this, among others), that statically optimal code is not necessarily the fasted, and dynamic optimisation can increase speed further. (The compiler DOESN'T know what the JIT can find out during runtime, and no amount of minutes spent sitting and thinking can change that)
      John

      --
      John_Chalisque
    3. Re:Compiler will be faster than JIT by Anonymous Coward · · Score: 1

      Actually, you're wrong on both counts.

      A moderately skilled assembly programmer can still write code that's faster than the code a compiler will generate without too much difficulty. Granted, it's bit tougher than it used to be, but it's by no means impossible or even unusual.

      Second, a just-in-time compiler can outperform one that does compilation as a separate, pre-runtime step. This is because the JIT has access to runtime profile information that the static compiler doesn't.

      Look at the papers on the Self project website -- they took a ferociously dynamic OO language and got its performance within a factor of two of C. Because they a) they had the profile information of the application as it ran, and b) had the compiler run transparently at runtime, they could do things like use the profile information to do inlining and other interprocedural optimizations, without fear of code bloat.

      If you tried to precompile Self code, then your performance would be abysmal -- it would run at the speed of something like Python or Perl. This is because there are very few static invariants that a compiler can use to optimize the program.

  128. Re:Why Optimize to overkill? by mikera · · Score: 2

    Well, a good optimization typically makes one type of instruction or sequence of instructions a fair bit faster (say 30-50%).

    While this might give a 2% improvement across average code, it will make a lot of difference if this particulr optimization helps out your inner loop. There are times when you really need this.

    The way I see it, if compilers can optimize effectively, then there becomes less need to hand-code an inner loop in assembly. Decent optimising compilers may not give you a great deal of extra speed but they sure as hell give the hardcore coder an easier life.

    BTW, The rate of hardware improvement is a good argument *unless* you are trying to stay at the cutting edge. In these cases, optimizations are certainly not trivial.

  129. possibly offtopic by speek · · Score: 2

    Talking about dynamic recompilation and program profiling, and specifically about Hotspot, I've always wondered why I have to put up with the fact that between program runs, the compiler "forgets" everything it learned during previous runs? Why can't the compiler generate a log for each program, and review the log when the program is run again, so the compiler can pick up learning where it left off?

    --
    First, make it work, then make it right, then make it fast, then, make it bloated!
  130. within X% of the performance of native code by The+Pim · · Score: 2

    [slashdot bug reformatted the last post] As the microkernel unix people used to say before they finally died out, "We are within X% of the performance of the original". - Alan Cox The JIT guys can write all the papers in the world about how they should (yes, really should) be able to get native code speed. No, that extra level of indirection is really a performance benefit, not a hindrance! I'll believe it when I run Java bytecode and it's not dog slow.

    --

    The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
  131. Re:Not so obvious by Anonymous Coward · · Score: 0

    programming & debugging time should be thrown in as well. It's better to underhype and overdeliver than overhype and underdeliver...

  132. The _REAL_ Reason by SamBeckett · · Score: 1

    There is a _good_ reason why Java is so slow, and this should be immediately obvious to any programmer in C or Assembly...

    Java has _A LOT_ of type checking done during run time. If you overwrite an array in Java, it throws an Exception.. If you do it in C or C++, no one really gives two **its..

    There are more intricies with Java that make it impossible for the code to be as fast as assembly because the language REQUIRES that these run-time checks be performed. If it doesn't do the checks, it isn't Java.

    So sorry, no matter how advanced your compiler is, if it is Java it will always be _slower_ than a compiled language with not as nearly as many type checks.

  133. Re:Profoundly counterintuitive? by bhurt · · Score: 2

    First off, there are profound performance differences that can be acheived simply by optimizing for the Pentium Pro and not the Pentium, see The Twofish Implementations. This is mainly just instruction ordering issues- possible to do in a JIT. There's more to optimization anymore than simple cycle counting.

    Second, memory management is a different beast under object oriented programming than under procedural programming. The holy grail of object oriented programming (as it were) is black box reuse- allowing a class maintainer to modify how a class works without those using the class needing to change their code. Not having garbage collection in an OO language either disallows true black-box reuse (which is a maintainability issue), forces the object to implement it's _own_ garbage collection (such as the C++ STL template basic_string does- and this is generally the worst sort of garbage collection, such as basic_string's reference counting), or forces the program to leak memory like a seive. If those are the options, then garbage collection is not too high a price to pay.

    Especially when you consider that GC isn't that high of a cost. _Every_ memory management scheme that isn't static in nature that I've seen is O(n). Either on the allocation or on the free. Sooner or later you have to search for a hole. GC doesn't remove this penalty, but instead it allows you to _defer_ it until a better time. Since most programs spend most of their time waiting for input, there are lots of places where garbage collection can be free because the program isn't doing anything else.

    Last, but not least, I'd like to point you at the The GC FAQ, a good resource for all things GC related. Highly recommended reading- after it, it was "intuitively obvious" that the world was flat, that the sun went around the earth, and that time didn't slow down as you went faster.

  134. Why JIT is slow by 31eq · · Score: 2

    When you run a Java program, the JIT has to do its thing before the program starts. That means the application takes longer to launch. For a lot of GUI apps, this is exactly where the greatest bottleneck is.

    So the kind of optimization suggested probably wouldn't improve matters for the user, at least with the small, visual applications Java was supposed to be good at. However, it suggests the people using it for numerical work may not have been so mad after all.

    If it works.

    Of course, you could cache the compiled code, so that it launches better second time. Pretty soon, you'll end up re-inventing compile/run cycles.

    1. Re:Why JIT is slow by John+Allsup · · Score: 1

      The major difference in the compile/run cycles will be that there is no (this is source, this is intermediate, this is compiled) dividing lines -- anything goes :-)
      John

      --
      John_Chalisque
  135. Running Java != Browser by Anonymous Coward · · Score: 0

    You just need a java runtime to run a java application (not applet), like you need a perl interpreter to run perl scripts. If you get the application as source, then you also need the compiler (not to be confused with JIT compiler).

  136. Re:compiler smarter than humans? by CFN · · Score: 1

    A compiler is just a machine that transforms a parse tree into machine code by following certain rules, so yes, there will always be situations where an ASM coder can best a compiler on a small section of code.
    However, the strength of a compiler is that it can automate tasks which would be very prone to error if done manually, such a loop unrolling, instruction schedulling to avoid branch and structural delays, etc.
    Today's large software systems, > 10 million instructions, would have taken 100 times longer and cost 100 times as much if written in ASM by hand. For hard real-time systems, where every cycle counts, humans will still need to write ASM, but for almost every other application, using a compiler is much better.
    rmstar - the following is not directed at you, but to the slashdot community in general and I just included it here to avoid having to post twice.
    Often, especially on technical threads, many slashdotters post on topics that they seem to not know anything about (i.e. someone above who did not know that compilers performed optimizations,e tc.).
    Everyone is entitled to have their own opinions, and not everyone has the same degree of expertise on a particular topic, but it is a waste of time for someone who has no idea what a compiler or assembly language is to post their "thoughts" on this topic. Ask a question, yes, but if you don't know what you are talking about, please don't write about it.

  137. Profoundly counterintuitive? by TheDullBlade · · Score: 4

    Seems profoundly intuitive to me.

    First of all, a JIT is rushed. You can design your optimizing one-time compiler to look at the code for better ways to do it all day, if you like, and while the developers will moan, they will still use it if gets them better results.

    Secondly, Java has a very specific standard (that is typically fudged... but that's beside the point). It doesn't give any leeway for a program to act a little different in boundary conditions, like C does. In C, an int is whatever size int is most easily handled by the target system; if you need a certain exact size of int, you can code that in with some extra effort. In Java, an int is a Java int and Java doesn't care in the least what the most efficient native int format is.

    Thirdly... automatic garbage collection is less efficient than hand coded allocation and deallocation, and dynamic allocation is less efficient than static allocation. There are odd cases where this is false, but they generally hold true, and where they do not, the C version can always use the more efficient method anyway. (and extremely fast calls to "new" can be easily achieved... at roughly 50% memory wastage)

    Fourthly: Java locks you into a OOP model which is inherently inefficient (at least as done in Java). All function calls must be dynamically redirected etc.

    I could go on, but it feels like trying to describe that water is wet and stones sink in it to someone who thinks it's intuitive that water should sink into stones.

    However, I will not dispute that you can definitely get better results by recompiling for each chip that something will run on. Some JITs may use this to push out ahead of one-time compilers.

    BTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?

    --
    /.
    1. Re:Profoundly counterintuitive? by tagish · · Score: 1
      Firstly I think you negated what I was describing as counter intuitive; I was actually agreeing with you that it seems counter intuitive that compiled code could be as fast as hand coded assembler. Sorry if that wasn't clear

      I'm inclined to dispute whether GC is inherently less efficient than discrete alloc / dealloc. I guess we could argue about it until the cows come home, and I certainly can't prove the point either way, but I've heard compelling arguments to the effect that, because a GC effectively deals with multiple deallocations at once there are economies of scale it can benefit from. My own experience with shifting heap allocator certainly supports this view.

      Of course, as I pointed out elsewhere, if there is any performance advantage inherent in garbage collection you could just re-implement your alloc/dealloc manager in terms of my garbage collector and make your dealloc call a NOP.

      I agree that current Java implementations are in part enslaved by Java's requirement for dynamic despatching, but even some current compilers are capable of determining many of the cases where methods can be despatched statically. Smarter compilers can mop up most of the remaining disadvantage.

      I still don't see anything here that means that Java (and other p-code language systems) must be inherently slower than hand coded assembler. Nothing technical anyhow. Economic issues are clearly another thing...

      Incidently, although I like Java, I didn't set out to advocate Java + JIT as an alternative to other languages. I don't find language wars productive. On a good day I might find myself using ARM Assembler, x86 Assembler, PPC Assembler, Perl, C, C++, VB, LotusScript, Java (OK, that would be a very good day).

      --
      Andy Armstrong
    2. Re:Profoundly counterintuitive? by Anonymous Coward · · Score: 0
      Ludicrous! You speak as if Java were intended as a vehicle for sound card drivers, space shuttles, or other terrifyingly dull (but admittedly useful things).

      Java, while having drifted from the initial premise of "Oak," has a fairly well-defined sphere, clear (at least) to anyone who has been charged with and created a large and complex piece of software over a nontrivial timespan and budget.

      Java makes coding easier, more efficient, and far less buggy, day by day, minute by minute. Java drastically improves coder performance, even if the coders in question are dyed-in-the-wool K&R C folk.

      Sure, Studly McCoder could write a GC as good or better than any commercial product. How likely is it, though, that Mr. McCoder is a member of your team? Pretty bloody unlikely.

      For the vast majority (9999/10000) of people/teams/companies/cases, Java will make your C++ programmers: faster, better, more consistent, less confused, and generally productive, happy folk.

      Admittedly, if you're writing weapons control systems, or rendering engines, Java ain't for you. It really isn't meant to be. Use the right tool for the job. If you're writing device drivers, sure, write in C. Write in assmebly if you fancy that. But when you have hundreds of contributors and millions of lines of code, Java shines.

    3. Re:Profoundly counterintuitive? by Kvort · · Score: 1

      >I disagree. That is what people have been told over and over again,
      but this is wrong. For example, you can solve the 8 queens problem in
      >500 assy lines, and there are a lot of problems in that complexity
      >range. And the fun factor is incredibly high.

      I'm not familiar with that problem.

      In the world of mathematical problems, its much better to take the time and bring your level of complexiy from n^3 to n^2, then to spend time shaving three cycles off n. The class of applications that really _need_ the speed difference are those real-time applications that use the full facilities of the machine you're working with. At the moment, this consists of mostly games and simulations. Even real-time video/audio compression does not really belong in this class: You can either run format X on this machine, or you cannot. Games will expand to fit the machine, and I believe they will continue to do so.

      But I digress. My point is, that the difference between an n^2 and (n-2)^2 algorithm is not really significant. And if you don't believe me, then find a book on time complexity, and have it explained by someone who knows the argument well... :) I learned enough to convince myself, and then forgot most of it.

      >>>>>>>>> Kvort

      --
      -Don't mind me, I'm personality-deficient and mentally-impaired.
    4. Re: Profoundly counterintuitive? by mch · · Score: 1
      Thirdly... automatic garbage collection is less efficient than hand coded allocation and deallocation, and dynamic allocation is less efficient than static allocation. There are odd cases where this is false, but they generally hold true, and where they do not, the C version can always use the more efficient method anyway. (and extremely fast calls to "new" can be easily achieved... at roughly 50% memory wastage)

      But in a dynamic language like Java, the compiler or runtime system can do these sorts of optimizations for you automatically.

      Fourthly: Java locks you into a OOP model which is inherently inefficient (at least as done in Java). All function calls must be dynamically redirected etc.

      Using class hierarchy analysis, many of these calls can be converted to static calls or even handled inline.

      Also note that since C exposes the memory layout to the user (with pointers), the compiler has to be very careful when doing optimizations. It cannot for example reorder structure fields or split structures into hot and cold parts.

    5. Re:Profoundly counterintuitive? by rmstar · · Score: 1

      I can tell you from experience that normaly, when you write assembly code, it turns out at least 30% faster than equivalent C code, no matter what you tell your C compiler - right from the start! If you know what you are doing, of course. After you spend some time optimising, this percentage grows.

      rmstar

    6. Re:Profoundly counterintuitive? by MikeBabcock · · Score: 2
      I was with you until your last statements:
      BTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?


      The problem in your first statement is that you've defined three categories, not two. I would summarise as:

      Good ASM coder + Optimising compiler
      (is better than)
      Optimising compiler
      (could be better than)
      Good ASM coder

      After all, without the code produced the the assembler, the coder has to figure out the rest of the optimisations alone.

      The second problem is with your question of problem solving. You're probably thinking on a philisophical bent, because you and I both know that software can be written to solve dozens, if not millions of problems faster than we can (digits of PI, etc.).

      If it is possible for a programmer to reach 90% efficiency, but after 60% it requires doubling the project length to reach 75%, etc. but the self-profiling program can reach 80% efficiency in milliseconds, which is "better"?
      --
      - Michael T. Babcock (Yes, I blog)
    7. Re:Profoundly counterintuitive? by BurdMan · · Score: 5
      Firstly, water does sink into rocks. It then freezes and the pressure it exerts reshapes the world as we know it. But that is beside the point.

      I will address your comments with respect to the Java Programming Language and its HotSpot compiler technology. If you would like to enlighten yourself on the techniques behind HotSpot take a look at http://self.sunlabs.com. Transmeta, IMHO, was influenced by the same concepts when designing its code morphing techniques.

      1. "A JIT is rushed" - The first time bytecode is compiled to native code it must be done quickly to avoid delay. That code may be inefficient, true, but it can also be instrumented with profiler like information that can be used by later passes of the compiler. You see, what the author is describing is not a JIT system but a dynamic optimizing compiler which has the opportunity to study the program during execution and recompile parts or all of that program based on that information.
      2. "no leeway for boundary conditions" - Well here again you seem a bit confused and your example is naive. The native representation of a type is not fixed in Java, only the conceptual representation. To be a compliant Java implementation all results generated using 'int' (to use your example) must be consistent with the conceptual representation of 'int' when using the various operations. If the optimizing compiler finds a way to represent an 'int' in a more optimal way it is free to do so as long as the results of the operation don't change.
      3. "garbage collection is slower than hand coded memory management" - You really ought to read some of the latest garbage collection papers. This is no longer true even for collectors managing C or C++ allocation. When given an environment like Java which doesn't allow pointers and other antiquated memory techniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.
      4. "OOP and dynamic dispatch are inefficient" - Again, I urge you to read the papers at the Self site listed above. The way the Self, and now HotSpot, compilers work eliminates this bottle neck. I'll admit in early implementations of Smalltalk and Objective-C dynamic dispatch did add a small amount of overhead. Take a look at the documents on the HotSpot here and then tell me that dynamic dispatch is a problem.


      In summary, you are more than welcome to use assembly all you want. Code on my brother! But, please before you slam some other method try doing the smallest amount of research first. Maybe your snap intuition is wrong. You never know.

      As far as Transmeta goes, it has a lot of the HotSpot/Self style technology and I personally think that technology is the future. I can't wait to get my hands on a Crusoe powered product.

      -BurdMan
  138. JIT is not the same as Language performance by J.Random+Hacker · · Score: 2

    The problem that one tries to solve with JIT based language (say Java) is not the same problem that one tries to solve with a systems programming language (say, C) is not the same problem that one tries to solve with an AI language (say LISP).

    I would not try to write device drivers in anything but C -- assembly is too unportable. I would not try to write an AI program in C unless I had an essentially unlimited budget -- LISP is much better suited to that task. If I want to interact with a WWW user -- Java is the best choice thus far. As to effeciency of the resulting program, the speed of assembly code is irrelevant in 99.99% of code, since the program spends most of its time in that 0.01% (OK -- I'm exaggerating, but not that much ;-). Ease of programming and maintenance is much more valuable -- that observation led to the decision to use C to implement UNIX in the first place.

    "So what" you ask? While we talk about JIT and code morphing vs. compilation vs. hand assembly, we need to compare similar things. Language characteristics are nearly unrelated to performance (*implementation* is a whole other story...) In any HLL there must be some translation from strings of bytes to executable things, whether byte codes (emacs lisp), VM codes (Java), or machine code (any of the above under an alternate translator).

    JIT tries to solve the problem of bridging the gap between one (abstract) instruction set to another. Code morphing does the same thing. Compilation does the translation from HLL to machine code in one block. The time available to a compiler to explore different register allocation strategies and branch orderings is much larger, but less information about the typical case is available than for the JIT. However, branch prediction is the only place where repeated JIT/morphing has an advantage.

    BUT: I repeat -- they solve different problems. Compilers create programs to execute on a chip. JIT/morphing makes a program written for one chip executable on another.

    The use/non-use of garbage collection does not tell us a-priori that a program will be slower than the same program without GC -- that depends on the allocation behaviour. There is considerable literature (both theory and practice) on this topic, for languages such as C (gasp!), C++, and LISP. Probably Java too, by now.

    Compilers may miss optimization opportunities occasionally but the assembly code they generate is usually pretty good, and it is free of human-errors.

    Finally -- Alan Cox mentioned the microkernel performance mistake. I think that it applies here, but not in the way you might expect. It is really easy to miss the real cause of performance problems. In my experience, choice of algorithm and data structure (even in OO code) have a much greater impact on performance than any other single factor (or even several other factors together). You should use language features to help with your task -- but don't be sloppy with them. That always leads to bad, buggy code, and poor performance.


    Hmmmm -- still on topic? You decide ;-)

  139. Re:Match code frag. as opt. tech. (Was:Evolve code by GaspodeTheWonderDog · · Score: 1

    Actually most of what he proposed is well defined ways of compiling code as known today. At least trees and SSA are, I'm too lazy to look up to the post to see what else he wrote about. Neither of those however are particularly magical.

    --
    This space for sale
  140. What C Compilers/Language needs. by Legerdemain · · Score: 1

    For processors that are not made by transmeta, what the C Language needs is a way for the programmer to tell the compiler extra information.

    For instance if you have a 'case' statement or an 'if' statement. We should be able to tell the compiler which code path is going to be considered the most common path.

    Ive seen a good discussion of this in the kernel list. There are places in the kernel where goto's are used to accomplish what could be done if we could tell the compiler which code path is most common. I "think" some of those optimizations in the kernel are assuming your going to compile with GCC on intel to get the performance gain. Please correct me if I am wrong.

  141. Cache misses and other nonlinear behavior by Frank+Sullivan · · Score: 2

    In theory, a human can write better assembly than a compiler - in a vacuum. But code doesn't exist in a vacuum, and it becomes less of a vacuum every day. Consider these few issues:

    * Cache misses: page faults that cause L1/L2 cache misses (going straight to memory, or worse, to disk) cost thousands, maybe millions of instruction cycles. Do you write all your assembly language to neatly break your data structures across paging lines? No? Didn't think so. btw, this level of optimization can be performed in C in many cases, by good C programmers.

    * Pipelining: Do you hand-code your assembly language to give the best hints to the branch decision-making hardware so it can optimize pipelining? Really? You're better than me. Or maybe your code is so much better than compiler-generated crap that the branch lookahead hardware support is useless.

    * APIs: I suppose you don't mind reverse-engineering every library call you might ever need, so you can avoid those wasteful, inefficient C headers.

    * Et Cetera: The days where humans can outthink compilers in the general case died with the PowerPC and Sparc I. Those of you who have written 1,000,000+ line assembly language programs are free to tell me i'm full of shit. Those of you who have worked on million-plus line code in high-level languages are free to give me an AMEN!!


    ---

    --
    Hand me that airplane glue and I'll tell you another story.
  142. counter-nit-picking by TheDullBlade · · Score: 1

    The term water, in the sense I used it, indicates a singular continuum bounded by the absence of water molecules. To indicate individual molecules, this must be expressly stated or clear from context, as English is strongly biased toward the aforementioned meaning.

    Water does not sink into a bucket of stones, it flows. Furthermore, it flows into the bucket and amongst the stones, but not into the stones. Water can sink into sand, but sand is not stones.

    Yes, water can seep into stones, but seeping is not sinking.

    --
    /.
  143. Re:here's how you do it (I think) by Anonymous Coward · · Score: 1

    There are probably 2 lines of attack which have to converge to the same point. There are some magical trade secret algorithms which go into designing the most efficient processors (such as the PPC), which are not "RISC" but OInC (optimized instruction set computers)--there are algorithms which are used to optimize the flow of bits through the machine. The chip manufacturers aren't going to spill the beans on these, so you really aren't going to know other than in a general way how to get those bit through the processor as fast as possible. The manufacturere, like IBM which has its feet in both the manufacturing and software camps, could presumably fine tune these algorithms to particular features of compiler technology--specific chip design algorithms coupled to specific compiler algorighms, so that you can either make a deterministic prediction that bad things will never happen or a probabilistic prediction that bad things that will slow you down greatly will be very rare events. I don't think anybody has gotten into this yet, but am not in a position to be informed if they were. (I'm a physicist, not a computer scientist.) Perhaps this is part of Transmeta's claim to fame.
    There is also something apparently magic that can happen when you do a really good job of *designing* an OOP language. This usually takes some time. I'm vaguely interested in scientific computing and you hear stories. Scientific calculation people love Fortran 77 and hate Fortran 90 and 95: Fortran 90 and 95 were abominations created by a bunch of CS professors who vaporlock if you can't do recursive programming, etc., just like C (never mind that any decent implementation of Fortran has the ability to run C subroutines if you want to use C to manipulate some data in a way which would be clunky in Fortran 77). Fortran 90 and 95 were 15% to 30% slower than Fortran 77--in other words, speed comparable to C (this is not flamebait, this is the word I've heard from people who pay attention to the number of _hours_ it takes a routine to run through a calculation). There is now an OOP Fortran 97, and I've heard a few strange stories about it--it seems to be 5% or so _faster_ than Fortran 77. My point is that a totally buzzword compliant language--like Fortran 97--can be designed in such a way that it can kick the butt of a carefully tuned and optimized language like Fortran 77, or at the very least keep up with it. This is already on the streets. What presumably has to happen for Java is that somewhere, sometime, it will fall into the hands of a group of people who know how to make languages run quickly. Right now, we are at the stage of knowing what we want/need it to do, and doing our performance tuning (such as it is) around that. Hopefully, when the performance adaptations have to be made to Java it will not be get out the nuclear weapons and let's start over.

  144. Java is still slower by Xeger · · Score: 1
    I use Java frequently at work, and personally I prefer it over almost any other language out there for application and network programming. But I would be remiss if I didn't point out a few potential problems with Java as a language, that make it less desireable for situations where performance is 100% critical.

    First off, I agree with the hypothesis about garbage collection being potentially faster than explicit allocation and deallocation (especially on a multiprocessor system!) Second, it might be possible to make memory allocation code aware of repeated allocations and deallocations of the same size, and have it set aside a few chunks of memory to be recycled continuously.

    The problem with Java, IMHO, is not garbage collection. It's all the dereferencing that kills you. Consider the case of a line of Java code that refers to the member variable this.myInteger. The Java VM (or the Just In Time compiled code, or whatever) first must convert the object handle to a memory reference to get the object's base address. (Object handles are a necessary evil because they let the garbage collector reorganize memory blocks on the fly.) Next, the VM must add an offset to the base address to obtain the address of the variable. That's a whole lotta dereferencing to get at a simple int.

    Running JIT-compiled code on a RISC machine, we've just performed a bare minimum of 3 loads to get to something that would have required only 1 load in C, or 2 loads in C++. The numbers are different on an Intel machine, but everyone knows those are slow by nature. ;)

    Another area to examine is method calls. In C++, non-virtual method calls cost almost nothing, and virtual calls have a small overhead. In Java, however, all method calls are virtual--not only that, they're called by method name instead of vtable pointer. A JIT could try and optimize this out, but it still must make allowances for introspection and method calls coming from un-JITted code.

    I'm sure I've missed a few reasons why Java is slow, but I think I've kicked it around enough for tonight. Until someone can tackle these problems, Java is still slower than an early-binding language like C++. But with judicious use of native code (which has of course been aggressively optimized) we might be able to narrow the performance gap to ~15%. That would be enough to convince most people that Java is not a toy.

  145. A little bit o' proof by cstaylor · · Score: 1
    Here's the output from running the following Java code:

    byte[] buffer = new byte[1024]; // Creating the buffer once

    for (long loop = 0L; loop {
    buffer[loop % 1024] = 42; // some number
    }

    And when running the JDK 1.2.2 VM with -verbose:gc you get a few GC messages at the beginning of the program, and then the GC never comes back.

    However, stack-based short-lived objects *are* important for high-performance math. James Gosling is working on that as a possible future enhancement for the VM.

    Just my 2-cents

    -Chris

    Yes, I know the code is contrived. I'm just trying to illustrate the fact that if you think before you allocate, Java is no slower in that regard than a manual "free-and-release" system like the standard C heap.

  146. Tailored Optimizations for asm conv. by LordOfYourPants · · Score: 1

    One thing which hasn't been considered to optimize the high-level->machine level conversion is having the compiler forward the code via network to a Sega Genesis. This code can then be blast processed and converted to something near-native to the transmeta instruction set. Note that line rate is irrelevant as the Genesis implements a ~3x10^70:1 compression ratio for 2 gigs of data in a single cycle. This is only when Sonic the Hedgehog is plugged in. Space Harrier seems to give a bit poorer performance.

  147. Oops... stupid HTML by cstaylor · · Score: 1
    That for loop should read:

    for (long loop = 0L; loop

    Stupid entities... why can't /. support the tag???? :D

    -Chris

  148. when ? by serialk · · Score: 1


    at what time will there chips actaully be out ?

  149. Comparing JIT Misses The Main Point ! by FrankW · · Score: 2

    Hmm, I think the author missed the REAL advantage of Code Morphing (or Binary Retranslation).

    JIT Compilers (or even FX32!) are pieces of software running ABOVE the OS layer. They are executed for a specific application which runs on top of it (that is the application on top of the JIT environment).

    Yes there are similarities between Code Morphing and JIT compilation, as each does some profiling to get better performance in frequently executed code areas. It is further true, that on more predictable and orthogonal platforms like the Alpha, all kinds of optimisation depend on knowing the underlying platform.

    BUT they only see the code from the application running on top of them.

    The processor itself however gets code from MANY SOURCES. The number of processes running on any machine today is quite large. In addition things as shared libraries, OS calls, etc add a pinch of code.
    Hence the processor still has to perform all kinds of clever things, trying to resolve dependencies, avoiding stall and stalling at times. For modern multiple Issue machines this gets VERY complicated. It slows down cycle times, increases size (and hence speed) takes up space which could be used for better performing optimisations (larger cache, etc), etc.

    This is one of the major headaches for CPU designers these days, as these parts increase unproportionally in complexity when trying to add more functional units, etc.


    This is the reason why VLIW (very long instruction words) are used in the next generation processors (i.e Itanuim). In those the compiler does all the hard work, resolving all dependencies stalls, etc, at compile time, and then simply sending one long instruction to the processor, which includes one instruction for every functional unit. This means the processor can run at MUCH Higher speed, as it loses a big overhead complexity.

    This is the reason why TRANSMETA uses such a VLIW, it enables them to build fast, small and CHEAP processors which can run at quite high speeds. (I am sure they could do more than 500Mhz, but then cost and power consumption get in the way. And the are much smaller and havent (yet) got the man resources as Intel, AMD)

    Now where does code morphing come into play ?
    I recently saw a few comments asking for access to the VLIW instruction directly, bypassing the code morphing. Well, that would DEFEAT the concept !

    The Big Thing about code morphing is that it sees ALL instructions which get executed, in the right order. This includes the library calls, the OS routines, etc. Hence it can do MUCH BETTER optimization than ANY Compiler by definition can do. A compiler can only know about the program it compiles (yes you can do clever profiling, but you still only have the application source to optimize), it cannot optimize ACROSS APPLICATIONS.

    The code morphing software has absolut knowledge about what code has executed and what code currently tries to execute. This enables it to do much better optimizations.
    Furthermore it is SPECIFIC to the underlying processor. It knows everything about the processor and even more important, the processor is designed specifically for it !
    (Hence the two versions are internally comletely different, each optimised for the kind of code linux/Win it is executing)


    This is what makes the concept so interesting. It adds an additional abstraction layer on top of the actual hardware, allowing the underlying implementation to be always up-to-date and optimized for the application (removing all the complexity of having backwards compatible hardware, ask Intel ;-)


    Overall, the approach is a MAJOR step forward, IMHO it can be compared to the CISC->RISC transition. It is after all the first implementation, competing with products well understood and optimized for years. It has potential to solve many of the major headaches of modern CPU designs (including COST). Comparing it to JIT is missing the BIG points.

    I think it will go a LONG WAY !


    Frank

  150. Speed vs. Convenience by The+Wookie · · Score: 1


    First of all, I have seen a few examples of Java running maybe only 10% slower than the equivalent C program. In these cases, the programs were doing raw calculations and not allocating anything.

    While I would love to see Java run as fast as C++, that's not the most important thing to me. I can develop in Java much faster than C++. Doesn't "fast enough" outweigh "fastest possible" if you get it sooner?

    Perl is slower than C, too, but you don't hear as many complaints about that. For most applications you use Perl for, writing the equivalent in C would be a huge pain.

    1. Re:Speed vs. Convenience by arivanov · · Score: 2

      Doesn't "fast enough" outweigh "fastest possible" if you get it sooner?.

      Yes it does but only if this is something you want to run not very often. If it is running constantly - think kernel, daemons, etc it does not outweight a thing.

      Also perl or python still beat the sh... out of Java on anything and develkoping in either one of those is much faster. What I actually want to see finally is a perl.so or python.so plugin that loads perl (or python perse) as an application and does policing on all dangerous IO functions.

      After all browser functionality is just an elementary module. Than the entire jabba the hut issue will be finally put where it belongs - to rest.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
  151. Re:its free that does the work. by Anonymous Coward · · Score: 0

    What does it have to do with C++? I mean, doesn't the same argument/counterargument hold for other languages with dymanic memory allocation?

  152. Sun#s Hotspot does what you want by Anonymous Coward · · Score: 0

    Suns Hotspot Java Compiler uses runtime profiling to optimize code. I agree completely with our points. I want to add that with Component based software the JIT will have another advantage. The advantage is that it can inline calls to the Component. A static compiler such as a C++ compiler cannot do that because it has no knowledge about the component. With an application build out of a lot of little components that can be replaced at runtime the JIT will have a huge advantage. Markus

  153. Alone and confused. by Anonymous Coward · · Score: 0

    Aren't you being a little optimistic here? You assume garbage collection can be cheap via a stack, but has it ever been cheap? The answer is yes, but only in the simple one program world of DOS. But for the multitasking, you're going to have lots of unusable chunks. I do agree that a JIT can optimize code, but the analysis requires more resources. You just stated that the JIT has to rewrite the code everytime it runs it. That means inserting and analyzing optimization routines all over again. It is true that a programmer can never optimize code as well as a compiler who knows the processor intimately, but any good assembler worth a salt can optimize code to at least 70% because he knows which loops/routines are executed the most. That's more than acceptable. I don't know if the remaining 30% is worth it. And based on your assumption of processor power, it's probably not.

  154. about Java by jilles · · Score: 4

    Hi,

    What transmeta does is very similar to SUN's hotspot compiler for java. Like Transmeta's code morphing, hotspot compiles and optimzes bytecode on the fly using profiling data to optimize the parts that are executed most.

    Your question as to why Java is still slower than C++ is answered in a series of javaworld articles (www.javaworld.com): http://www.javaworld.com/javaworld/jw-11-1999/jw-1 1-performance.html and http://www.javaworld.com/javaworld/jw-12-1999/jw-1 2-performance.html and http://www.javaworld.com/javaworld/jw-02-2000/jw-0 2-performance.html

    Basically the problem is that stuff such as memory allocation, garbage collection, synchronization and runtime type checking have a performance price (you get a safer programming environment in return). The articles discusses these performance issues and give some usefull advice to avoid these bottlenecks.

    --

    Jilles
  155. NP problems by Anonymous Coward · · Score: 0

    Well in that case think of Penrose tiling. People do kick ass at NP hard problems. That's why people are employed to pack things into boxes, not robots.

    1. Re:NP problems by AndrewHowe · · Score: 1

      If it is hard to pack things into boxes, it is only a mechanical problem. Finding an optimal solution is hard, but my Sinclair ZX-81 would beat you at that (if you disallow RAM-Pack wobbling :)

  156. Re:here's how you do it (I think) by Anonymous Coward · · Score: 0

    First of all, I don't think any CS people work on FORTRAN at all. FORTRAN 90+ are just an attempt to keep the language current with the newest language fads. Keep piling in features until you can't find the structure underneath the junk. However, I'm not too happy with FORTRAN 77 either. The claim that it is faster is due to a couple of reasons, but the most often cited one is that everything is passed by reference in a global location. You can of course do this in C with global variables and achieve similar function call performance. Aliasing are the second most cited reason, and const, ANSI aliasing rules, and a good compiler all help make that less of an issue.

  157. Not so obvious by Morgaine · · Score: 2

    There's local overhead in the instrumentation code, but not a net overhead in the combined instrumentation code plus controlled loop code: the loss due to extra code prior to a loop is far less than the gain resulting from the more efficient loop code produced by introducing that extra instrumentation.

    --
    "The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
    1. Re:Not so obvious by nhowie · · Score: 1

      The obvious wrong answer, because it assumes "program code" as a constant.

      My entire point was about the difference between the Java program code and the asm code.
      I was assuming that both used the same algorithms, so were comparable in everything except language used (which is a very big difference), and said that the compiler would have to optimise the Java one to be faster than the asm one by the same or greater than the overhead.
      My fault though, I should have been specific in that point and differentiated between the two. Anyway, I shouldn't be posting this late into the thread, but I hate when people misunderstand what I mean (despite my being more fluent in C than English - can't stand it, no pointer arithmetic;)
      --

    2. Re:Not so obvious by stevew · · Score: 2

      Jilles,

      Beating a modern compiler is easy...if you have the right backround. For people that came into programming when 16KBytes was ALOT of memory, assembly coding in an efficient manner is like breathing air. For folks that grew up in the era of 16Mb isn't enough to boot my OS, well... hopefully you see my point.

      Modern compilers are marevelous pieces of technology...and I've worked with some of the best folks in the field at one time or another (most of these folks are off working at HP or Intel right now on Merced me thinks..) Those same folks could write better code than their compilers produced.

      Note - The Merced and it's breathern are considered decendants of the VLIW machines built in the 80's. Alot of the same people are working on Merced that were the engineers at those VLIW startups.

      --
      Have you compiled your kernel today??
    3. Re:Not so obvious by nhowie · · Score: 1
      The question was:
      can anyone actually give me a good reason why Java + JIT should be slower than Good Programmer + Assembler?
      I was simply giving the obvious answer that:

      JAVA version = program code + VM + JIT
      Asm version = program code

      I'd say this is a good reason why it should be slower, since the average speed increase has to be greater than the overhead (can a compiler really produce code that is that much faster than well-optimised asm?), also I'd like to actually see figures for the best case vs worst case for an average userland program running on this, since in any decent sized program, there's going to be a lot of code to keep track of.
      --
  158. Why shouldn't it? by Anonymous Coward · · Score: 0

    The instruction set in Crusoe is supposedly 'morphable' and from this I get the impression that the programmer could just rewrite the instructions (of course, after obtaining the source) to better optimize Java or any other language. Hmmmm... or maybe I'm just typing a bunch of nonsense and trollin'.

  159. Dumb compilers, dumb languages by blanu · · Score: 1

    The problem is indeed that the compiler doesn't know everything that the programmer knows. However, the reason for that is that modern programming languages aren't sufficiently expressive. There's no reason that I shouldn't be able to tell the compiler that these particular variables should also be less than 256. Not only would that be useful for optimization (for instance, I could say this number is between 7 and 17, and it knows it can store than in a byte instead of 3) but it would also simplify some programming tasks such as bounds-checking for arrays and input. Really, you should be able to specify in a program everything you know about inputs. This would also be useful for choosing algorithms. You could tell the compiler you needed to store some information to be looked up by a numerical key and inserts can be slow, but lookups need to be O(1). And it could choose an appropriate data structure for you. Same with sorts.

    The performance jump just from finding the right data structure/algorithm from the hundreds available would be a huge performance leap.

  160. Ultimate optimisation needs better hints by dingbat_hp · · Score: 3

    when the programmer has decided that space is more important than speed.

    That would seem to be one of the issues where manual coding has the edge. I agree with your general point, but I think there's still work to do before auto-generation is perfect.

    Imagine a case where an algorithm could easily optimise either way speed/space - maybe there's a hash table that's going to hold some programmer-controlled depth of the initial search and allowing it to expand would make usage quicker, but eat memory. A human coder would probably know the pattern of usage this routine would get. By knowing the practical number of data items to be encountered, they could optimise the hash size. A compiler can't do this, because the necessary information comes from the overall systemm domain, not just the source code. To allow compilers to compete effectively, we'll need much more subtle optimisation hinting than we currently have; especially that of the form "ignore this block, it's only used once" and "speed like crazy here, and you'll never have more than 5 items loaded simultaneously".

    I'm not a compiler / assembler geek, so maybe someone is already doing this ?

  161. No. Hand coded assembler is faster. by FutileRedemption · · Score: 1

    >It's my belief that it should be possible to make a compiler generate much better code in the general case than someone writing hand coded assembler.

    No. Hand coded assembler, coded by a clueful programmer, will always be better than compiler generated code. Or at least as long as a compiler cannot considered an advanced Artificial Intelligence...

    To some large extent this is a question of creativity. Compilers are not creative.

    for argument (1): Well, we are talking about a =clueful= programmer.

    for argument (2): This isnt significant.

    For the profiling: A clueful assembler programmer can do this too.

    1. Re:No. Hand coded assembler is faster. by mikera · · Score: 2

      Hmmm.... back in the days when I used to dabble in a bit of demo coding I would frequently benchmark my 486 code against the (Borland) compilers.

      I almost inevitably won. Hands down.

      Compilers may have come on a bit since then, but I'd still fancy my chances. Of course, I would only dream of doing this kind of optimization for the most critical inner loops. Most of the time it isn't worth it.

      But I would like to stress the point that a human with good knowledge of the system and the problem at hand should be able to beat a compiler in many cases. Compilers can't think laterally.

      For example, consider a function that must fill an array of bytes with consecutive numbers between X and Y.

      Compiler produces beautifully optimized loop that increments a counter, moves an array index and sets the bytes appropriately. If it's clever, it might use the same register as both an index and a loop variable. No problems. Pretty fast, but....

      Joe Hacker takes one look at the problem, sets a 32-bit register to contain the bytes X, X+1, X+2, X+3 and does a quarter of the number of loops just adding 04040404 hex to the register each time.

      Joe wins, because he *knows* that most of the time X and Z will be sufficiently far apart for his bit of hackery to be quicker. His knowledge of the nature of the problem will beat the compiler.

      This is the kind of thing that humans will be better at for the time being. Once AI compilers start to make an appearance, things may change. But right now, I'll bet on human intuition any day of the week.



  162. Re:Why not? by ASM · · Score: 1

    Sounds great, but wouldn't that learning process lead to the possibillity of bugs, based on incorrect optimizations the compiler had learned? How would you get the compiler to generate proofs that show that its optimizations will be guaranteed to work? Further, what about human errors in coding? Wouldn't human errors in code make it possible that the compiler learns really bad optimization methods? Just some random thoughts....

    --
    Fish
  163. It's even older than Hotspot [Was: Nothing new] by Anonymous Coward · · Score: 0
    Finally somebody with a clue. Yes this is old news, but apparently the idea is starting to sink in. The SELF project were among the first to realize the technology and as everybody knows by now, it sure does work. This project inspired Anemorpheus (sp?) which made the Hotspot tech. for Smalltalk, and then later for Java. What is really interesting is that Sun was highly active in the SELF project (talking about a full circle).

    Crouso is basically just taking the idea to it's logically next step (hardware support for this).

  164. its free that does the work. by BeauNiddle · · Score: 1
    I could be wrong but I think he is saying free() does all the work. If you have a 500 object stack and then free the first all the others get moved down with a large overhead, do this 499 times and it is slow.

    However by using global garbage collection the compiler knows that all 499 are going to be deleted so it deletes them and then shifts the remaining memory down. This means that the speed increase of using a stack to create memory isn't lost when deleting and so overall it should be faster.

    1. Re:its free that does the work. by Anonymous Coward · · Score: 0
      However by using global garbage collection the compiler knows that all 499 are going to be deleted so it deletes them and then shifts the remaining memory down. This means that the speed increase of using a stack to create memory isn't lost when deleting and so overall it should be faster.

      You're forgetting that program execution isn't always linear - ie conditional branching can and does occur, so the garbage collection is a run-time occurrence, not a compile time occurrence. Furthermore, unless you're the kernel doing VM and memory remapping, moving people's pointers is a bad idea, unless you're using pointer indirection for memory management, ala Macintosh & NewHandle() [and in that case, the programmer can lock the handle and be guaranteed it won't move...]

      Why not just use C++ like a normal person? If you want a local object allocated on the stack, you can. If you want a pointer that will last beyond the scope of the function, you can new it. Allocating memory exclusively from the stack will certainly save cycles, but it will also break 99% of programs.

  165. We're close... But not there yet by Anonymous Coward · · Score: 0

    Well... History has proven that handwritten assembly has always had an edge over HLLs. But now you are suggesting that Java, which IMHO runs like Windows on an underclocked XT, could be faster with A JIT-compiler? ROFL! No, off course not! A compiler may know clockcycles and all, but it does not understand the actual logic of a program, or the more complex tricks. It can hardly keep track of the flags (meaning that it doesnt know when to replace MOV AX,0 with XOR AX,AX and those kind of things) and when it comes to complex instruction re-ordering it becomes too much. If you don't believe me, just download the Quake source and see the difference with and without assembly (especially software rendering). Compilers have improved, but handwrittem assembly still has an edge.

    But that is about to change. Indeed when the Merced and K8 are released, compilers might win. These CPUs were optimized for compiler optimization and I think when the x86 architecture is ditched, the days of handwritten assembly are over. Until then, stick with it.

  166. Re:Isn't this REALLY the smarter compiler arguemen by stevew · · Score: 2

    I'll just say that when the compiler and the machine are already married together, the compiler DOES know about the size of the caches. I'm specifically talking about VLIW hardware as oppossed to a super-scalar machine that has multiple execution units. The main difference between these two classes of machines is that the super-scalar machine has a hardware layer that is doing the instruction scheduling where a VLIW box has the compiler doing that function. Thus I feel your observation is more of an apples and oranges situation.

    --
    Have you compiled your kernel today??
  167. False figures by joss · · Score: 2

    Where the hell did you get these figures ?
    What compiler, what OS, what architecture.
    They look *completely* out of line with every
    other test I've seen. Check out the JGL against STL comparison above for some real numbers.

    For a start, what the hell is the difference between integer division in C or C++ - no compiler that I've heard of written in the last 5 years produces different code in these two cases.

    Secondly - who remembers to turn on optimization: anybody who cares about performance and isn't completely stupid.

    --
    http://rareformnewmedia.com/
    1. Re:False figures by lgas · · Score: 1

      gcc, g++, Linux. The difference between C and C++ is whether it's in a class or not. If you compile the C program with g++ you naturally get the same results.

  168. bickering by Anonymous Coward · · Score: 0

    Well, all you geeks who's lives revolve around code, I can sympathise that you are indeed vulnerable to advances in technology. If you were smart then you would develop embedded hardware and write code for it too. That way you would gain the best of both worlds. Any dullard can write a c/c++ VB app. True legends make the hardware, which makes things happen in the real world!

  169. Re:Transmeta Speed by mgiammarco · · Score: 1

    Can it be done with standard cpus ? I mean, buy a motherboard with 2 alpha (or 2 power pc) to use with linux. When used in native mode you have a very fast computer. If you have to run a x86 program one cpu translates and the other executes the code translated...

  170. within X% of the performance of native code by The+Pim · · Score: 1

    [slashdot bug reformatted first two posts]

    As the microkernel unix people used to say before they finally died out, "We are within X% of the performance of the original".
    - Alan Cox

    The JIT guys can write all the papers in the world about how they should (yes, really should) be able to get native code speed. "No, that extra level of indirection is really a performance benefit, not a hindrance! I'll believe it when I run Java bytecode and it's not slow.

    --

    The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
  171. Real world?!?!? by marcus · · Score: 2

    >I can say that they do not offer Java in the
    >CS department. This is a reflection in the fact
    >that Java is not important for what programmers
    >actually do in the world.

    Man have you got to get out in the real world! You and your profs that is. Right now Java is my job. A year ago I was coding c and asm on an 8051 derivative for an embedded project. Today I'm building a distributed network management system for a company that builds fiber backbones. Guess what the most universal platform for executing code is? I'll give you a few clues: All their winXX boxes support it. All their *nix boxes support it. Any ideas? It's called a web browser and everyone that might be interested in designing, configuring, maintaining a network, every one of their laptops, desktops, and workstations has one.

    Sooo...which platform do we support? What language do we choose? You only get one choice. I suggest that you transfer to a, uh, more modern school ASAP.

    --
    Good judgement comes from experience, and experience comes from bad judgement.
    - W. Wriston, former Citibank CEO
  172. I'll open with a pair of twos... by Steve_Dallas · · Score: 1

    You're correct in saying that JIT generated code can run faster, be more efficient, etc. The only real problem is that you still have the initial burden of compiling every time you want to run the code. This is not a trivial amount of the time spent waiting for JAVA (to use one example) to do it's thing. Compiling of the sort you're describing is hugely complex and in the medium to long term will outway any benefits you might see from efficiency improvements in the actual code. Cheers

  173. you have no idea what you are talking about by spacewalrus · · Score: 1

    read my post entitled
    Hey everybody..Listen up

    Later
    Spacewalrus

    1. Re:you have no idea what you are talking about by GnrcMan · · Score: 1

      First off, fuck you. I don't appreciate being called an idiot.

      Second, your oh so great and well thought out opinion has a flaw. Humans are falible. They make mistakes that a compiler will never make.

      Third, do I really need to add the disclamer:

      Compilers will generate more optimized code (unless a human has 150 years and a huge library)

      So lets see...you've insulted me, but that isn't what really pisses me off. You've insulted me and failed to add anything useful to the discussion. We all know that someone can calculate out Pi to the billionth digit using (lots of)paper and pencil. But who cares? They will: A. make a mistake and B. be dead before they finish.

      Casey


      --GnrcMan--

  174. Re:Match code frag. as opt. tech. (Was:Evolve code by Anonymous Coward · · Score: 0

    If you think about it closely, you'll find that the easiest way to really "clearly define" a problem is often just to write it in a programming language. No, I don't think, computer programming will ever be fully automated. Sure, improvements in technology and new tools will make it much easier for a programmer to oversee the complexity of today's programs, but this doesn't mean programmers will no longer be needed. Tasks will shift from tedious low-level work to reuse and combination of existing building blocks, but this will be compensated by a growing project complexity.

  175. Re: Just wondering by guardian-ct · · Score: 1

    Same morphing regardless of language or virtual machine...

    From my understanding, the Crusoe morphing starts from the x86 instructions. So, regardless of what language the program was written in, or how it was invoked, the Crusoe chip is still morphing the x86 instructions into its native code. The Crusoe is not morphing Java into a virtual machine, its at a lower level than that.

  176. Re:Self-Modifying Code by Tune · · Score: 1

    Not at all true... self modifying code is still around not only on JIT compilers but in interpreted languages and even device drivers (Or how do you think some device drivers adjust in real time to hardware changes?)

    I guess you're right, though I'm not sure whether we should consider things like function-vectors as self modifying code. These programming structures have relatively well-behaved formal counterparts (such as lamda calculus). Are more dangerous constructs used in practice too?

    BTW. What happens when two threads executing on two different processors try to optimize the same piece of code? Without proper synchronisation mechanisms (such as write-through or cache snoops) this is very dangerous. Work-arounds, suche as semaphore will most probably cause a performace penalty.

    Also, how to cope with asymetric multiprocessing? This obviously raises problems for binary incompatible processors using the same memory. But it will also lead to at least inefficiency when two processors ARE binary compatible but need different optimizations...

  177. Re:Q: Performance of C++ STL vs Java JGL ? by Anonymous Coward · · Score: 0

    I don't have a real comparison for you, but check out this comparison, which might be a starting point for your comparison. Basically, if you use built-in types (int, array) in Java, it can go fast, but when you start using classes, Java slows down while C++ STL stays roughly the same speed.

  178. The Final Word by spacewalrus · · Score: 3

    This whole question boils down to whether or not a human mind is deterministic. We can prove that a
    compiler is, or in fact that any program that runs on current computer hardware is.

    If a human mind is deterministic then it follows that a human could do just as good as a compiler simply by using all of the same algorithms to generate the code, although the computer would certainly finish alot sooner.

    If a human mind is NOT deterministic then it is clear that a person who had no programming skill whatsoever could beat the best compiler given enough time, a specification sheet of the language he was to generate the given algorithims in and a circit diagram of the cpu he/she was coding for.

    This is VERY simple to see. A compiler can NEVER
    under any conditions for any reason EVER EVER EVER produce better code than a human because the
    human can ALWAYS produce the same code that the compiler did. The question should be is it reasonable to wait the 150 years of time it would take me to compile mozzila by hand or should I build a compiler to do it in a few hours. The next question would be how much time(if any) should I spend hand optimizing the compiler output.

    A question I have is why cant we do the same thing that a JIT compiler does before runtime? It seems
    clear that you could easily get very close to as much optimization from a multi pass profiling compiler as you ever could with a JIT without the runtime overhead.

    Later
    Spacewalrus

    1. Re:The Final Word by Anonymous Coward · · Score: 0

      Under some idealistic assumptions this might be true. But unfortunately humans are fallable and mortal. So why yes you can by hand use that cool formula of the kth digit of pi, a person trying to generate the 10 trillionth digit will fail while a computer will not. If nothing else the person will die. People use computers to balance their checkbooks - not because they can't do the math involved but because the computer is error free and fast.

  179. Reinventing the wheel by Anonymous Coward · · Score: 0
    For all the discussion of the exciting techonological inovations of Transmeta, I recall reading something in The Register to the effect that AMD has copper K6-III running as cool as Crusoe and considering they shipped them a while ago at 500 Mhz I would think they ship the mobile version just as fast if not faster. Here is the point. By the time the 700 Mhz version of Crusoe hits the market, I think Transmeta said end of Q2, AMD should have proven technology out that for most purposes will beat Transmeta. Sometimes bells and whistles, as impressive as they might be, don't do anything but attract attention.

    Second point. I might just be starting out in this field, but what would make the most sense to me is to include performance profiling with beta testing software. Just imagine an algorithm using data gathered from several thousand users in almost real time over the internet automatically recompiling and distributing the software back to the testers. A modification of the same process might allow big software companies to vastly improve their product between releases based on the data collected from users. Better still, imagine adding branching to the algorithm for different uses. Microsoft might no longer have an excuse to sell the same piece of software for insanly different prices. Serving a web page rather than playing Quake, here's your kernel. Using your database for this or for that, here's a different component. Just my two bits between classes.

    1. Re:Reinventing the wheel by Anonymous Coward · · Score: 0

      Very stupid answer. The AMD K6-III is NOT like crusoe. Crusoe is not only a high-speed chip with low power consumption, but a *NEW* architecture, similar to Itanium. Can someone explain to these dolts for once and for all that VLIW != Superscalar RISC/CISC?

  180. C/C++ don't check bounds by Jecel+Assumpcao+Jr · · Score: 1

    In Self 2.0 (back in 1991) they got half the speed of C for highly numerical benchmarks. On conventional processors, that is as far as a language like Self or Java can go.

    C, C++ and Fortran get their speed by not including safety features:

    • checking that indexes are within array bounds
    • checking that math ops don't overflow/underflow and so on (automatically converting numbers to larger representations if they do
    • they have pointer math so you can index faster in loops, at the risk of generating nonsense pointers

    On a very different topic, I haven't seen any mention of memory usage. A dynamic compilation system only has to generate code for that parts of the program that are actually used, while a static compiler must generate code for everything. In the first case you lose some memory for the compiler itself and some more to have the "source" available at runtime. But the fact that you can use the rest for just part of the program means you can be very agressive in inlining and loop unrolling compared to the static compiler and this could result in faster code.

  181. This is so true, but it's getting harder by Dominic_Mazzoni · · Score: 3

    I completely agree that humans can always write better assembly than compilers can generate, but it's getting harder.

    Not all that long ago, compilers weren't all that smart, and processors were a lot simpler. Today processors use tricks like out-of-order execution and branch prediction, making it extremely difficult for an assembly-language programmer to determine exactly what is going on. It still remains the case, though, that the programmer may know something about the code that the processor doesn't (that a register will never contain a certain value, or that some execution paths are much more likely or more critical than others) which will give the programmer the edge.

    None of the just-in-time compilers that I am aware of recompile code as they go. If the Crusoe actually attempts to optimize sections of the code based on runtime profiling information, this could be the main reason it performs so well. There have been academic papers written on this idea before but this would be the first implementation I've heard of.

    1. Re:This is so true, but it's getting harder by Anonynous+Coward · · Score: 1
      If the Crusoe actually attempts to optimize sections of the code based on runtime profiling information, this could be the main reason it performs so well. There have been academic papers written on this idea before but this would be the first implementation I've heard of.

      This is exactly what Sun's HotSpot for Java was designed to do. Arguments about how good (or bad) of a job their software-based runtime profiling works right now are left as an exercise for the reader.

    2. Re:This is so true, but it's getting harder by tagish · · Score: 1
      It might also be said that humans can always write a better compiler than the one they just completed.

      For arguments sake let's arrange a code off between a completely optimal compiler (this is a somewhat idealised scenario) and a completely optimal human programmer.

      In a normal compile once, run many times scheme of things, assuming the human and the compiler program equally well neither has the advantage, but if you make the compiler into a JIT compiler that works actively to optimise the code in real time, perhaps recompiling more than once before it is satisfied, the human loses out. It becomes the equivalent of shipping a highly skilled assembly language programmer with each copy of the software and having them tweak it in real time (assuming they could program very, very quickly).

      I can't see too many theoretical reasons why a compiler shouldn't program as well as a human. All of the obvious ones are consequences of the human having a better understanding of the problem so that for example s/he knows that a variable can only ever hold a small subset of the values its size would allow. Most of these things though are either moot in the context of modern processor design (because the natural word length is fastest) or are actually better left to a JIT compiler that can monitor the program in real time and, if necessary, rewrite it also in real time.

      Imagine for example a numerically intensive algorithm which processes a data stream who's character changes slowly over time (image compression for example). A JIT compiler (+ instrumentation) could watch the algorithm as it runs and, at some point mid data stream, decide it needs to re-optimize the code. No problem: it rewrites the code and switches it in real time.

      I suppose this is the core point I'm making: all other things being roughly equal Transmeta style technology may have the potential to yield processor architectures that actually work faster than the current theoretical maximum.

      --
      Andy Armstrong
  182. extra processor? by RoLlEr_CoAsTeR · · Score: 1

    Well, I noted that someone else said that it wasn't really a problem, but, if someone were worried enough about overhead for processing time, couldn't the JIT/VM become standard and then companies just put an extra, not so whiz-bang processor "alongside" the "main" processor to "process" and "optimize" the code? or would that create even more lag time because of slow communication between the processors/memory usage? If it wouldn't be a problem, and this could be done, then the entire capabilities of the "main" processor of said system could devote its time and cycles to executing the created/optimized code produced by the "extra" processor..

    or, maybe I'm just babbling..

    --

    Insert mind here.
  183. here's how you do it by hoss10 · · Score: 3

    I was about to try and answer your question but it's bloody difficult to explain!
    With garbage collection implemented in the best way the JIT or whatever doesn't try to hard to keep a record of what mem has been allocated.
    But, whenever the system isn't paricularly busy or if it panics due to low memory the JIT can traverse all the objects that have been allocated (just starting with each of the Thread objects I think) just by checking the references each object has until it has a list of all the objects in the system. There are a few reasons this is more reliable (not just faster) than maintaining a reference count.
    Then I suppose it (the JIT) could ask the OS to rearrange it's memory mapping thingies (Page Translation whatchamacallems) to put any of it's user space pages after the current position of the stack or something like that (hard to describe but i know what i'm talking about (even though i can't remember the name exactly!))

    Basically it can be made to work but maybe not particular efficiently because it could result in a full page (4K) been used for just one small object but this can also be fixed

  184. Isn't this REALLY the smarter compiler arguement? by stevew · · Score: 5

    Years ago I worked at a company that was going to bring VLIW to the commercial world. One of the things they had to invent was MUCH smarter compiler technology. That was in 1985, so the basic ideas involved here are at least that old.

    Anyway, when the compiler DOES know the hardware, especially VLIW machines, they can indeed do a job as GOOD as a human. (Look - they won't usually do better jobs, when the guys who designed the machine write code, they're going to do what the compiler is trying to emulate..) But from what I remember, static profiling was considered "good enough" back then if the VLIW machine is built right.

    Most codes spend most of their time in inner most loops. If those loops can be rolled up correctly for VLIW you can be executing different interations of the loop within the same instruction slot. Being able to do this is usually the largest performance payoff. If you have a GOOD compiler in the first place, that KNOWS the hardware, then JIT and morphing aren't needed, or help.

    The place that I see the morphing being a step forward is that VLIW machines were considered architectural dead-ends until just now. If I use one of these smart compilers for a machine with say 7 functional units, the code that emitted won't work on a machine with 8 functional units, or at least not be optimized any longer. These machines didn't scale well until now! Code-morphing technology completely removes this limitation!

    --
    Have you compiled your kernel today??
  185. Re:What about the JIT? -Server versions by borzwazie · · Score: 1

    If you read a bit farther into the release information, you'll find that the 700mhz part is designed for handhelds, laptops, etc. A bit farther down you will see a part talking about ease of addign to the architechture for models that are not portable-based, i.e. server architecture. They go further into performance gains from this change in architecture.

    --

    "We apologize for the inconvenience."

  186. Business by Anonymous Coward · · Score: 0

    "can anyone actually give me a good reason why Java + JIT should be slower than Good Programmertm + Assembler? " The good reason is; Sun sells CPUs. The faster they make Java, the less new, faster CPUs they sell to their customers. Intel too.

  187. "Intimate knowledge of the target processor" by Tuck · · Score: 1

    For those that haven't seen it, distributed.net's crypto-cracking client has multiple "cores", from which it selects the fastest one for a given processor.

    I ran a comparison between three machines (MII@233MHz, K6-3@400MHz and Celeron@300MHz). Remombering that they're all x86 architecture machines, there's some huge variations:

    core MII K63 Celeron
    #0 175 569 354
    #1 426 648 70
    #2 359 617 841
    #3 520 547 694
    #4 374 602 693
    #5 362 704 712


    Notice that the largest is a ten-fold difference!

    The point of this? Imagine what your run-time optimising compiler could do when it knows the exact charateristics of the chip it's running on, as opposed to those of the chips it might run on. I know this is an extreme case, but there can be significant differences.

    --
    $ find /pub -beer "James Squire Amber Ale" -drink
  188. Re:Ooo, ooo, start on packing! by John+Allsup · · Score: 1

    True, but languages that punish bad code, rather than just accept it will have better (and usually less -- another improvement) code written for them.

    Consider what would happen if you recieved a 40V electric shock every time you wrote bad code, or were in careless in some way -- there'd be less coders writing code, but the end result would be better (provided that the electric shocks were dealt out fairly -- and this is the problem of standardising...)
    John

    --
    John_Chalisque
  189. Re:Match code frag. as opt. tech. (Was:Evolve code by John+Allsup · · Score: 1

    Think 'little theorem'.

    For example, when proving a major theorem in some research paper, you'll write some smaller things building up to your main result(s) in 'lemmas'.

    What the previous poster is referring to is sets of 'when this holds, so does this' type implications (e.g. if a is invariant bc and ca, we know that ba and don't need to work this out explicity)
    John

    --
    John_Chalisque
  190. Rewriting the program each time by Anonymous Coward · · Score: 0

    If you have the compiler rewrite the code each time you compile (or just-in-time compile) how would that affect performance on a Transmeta type chip? I don't know what I'm talking about, but I think that I read that the chip itself (along with its low level software layer) will learn each time it runs the program so that it can run it faster the next time. If the chip is faced with a different program each time the application is run, how can the chip perform to its best?

  191. Re:Transmeta Speed by Anonymous Coward · · Score: 0

    Of course, such an arragement would also consume twice as much power, negating the very design goal of Crusoe...

  192. ...and auto-profiling doesn't work by milliyear · · Score: 1

    Automatically optimizing just the code that gets executed most often doesn't work.

    Take a simple calculator program for example. It spends most of it's cycles in a wait loop waiting for input. What's the point of optimizing a wait loop??

    And that hyperbolic trig subroutine that you use 3 times a day might never get optimized; but you would fume if you had to wait 5 minutes for an answer when you needed it.

    1. Re:...and auto-profiling doesn't work by AndrewHowe · · Score: 1

      One would hope that it doesn't loop waiting for input. One would hope that it goes to sleep until the operating system provides input.

  193. Re:Self-Modifying Code by ZoeSch · · Score: 1

    Function vectors!=self modifying code but code that patches on the fly (i.e. TCP/IP stacks for non reentrant OSes). That would also exclude DLL's and other dynamically linked libraries.

    BTW. What happens when two threads executing on two different processors try to optimize the same piece of code? Without proper synchronisation mechanisms (such as write-through or cache snoops) this is very dangerous. Work-arounds, suche as semaphore will most probably cause a performace penalty.

    You have to decide what's mode important, in high speed CPUs where the CPU goes 10x as fast as the memory bus write throughs are a bigger performance hits than semaphores, in massively parallel architectures it could be even worse because another processor might be locking access to the RAM or I/O. OTOH, semaphores can be a performance hit if htey're not persistent. That brings you to the following problem, do you run a single kernel image that controls all processes or do you run several kernels with interprocess communication between the kernel images and the threads?

    For example the INMOS Transputer solved the problem through interprocess communication using tuple-spaces in OCCAM, some HA *nixes use semaphoring between multiple kernels, so you have a performance hit which is offset by the more robust setup. And.... BTW, IMHO SMP without parallel I/O and storage offsets the benefits of having several processors except for computational intensive operations that don't depend heavily on stored data sets.

    Coping with Asymmetrical Multiprocessing? Wow, I haven't thought about this in a long while... there's only one answer there... sepparate instruction/data stores and a shared memory space for communications... either that or forget about the whole thing :)

    All systems are inherently inefficient, what you do to make that inefficiency not noticed is what counts.

    ZoeSch

    --
    I hate to agree with davecrazy but...
  194. The C failure with "int" by FutileRedemption · · Score: 1

    >In C, an int is whatever size int is most easily handled by the target system

    This is not a feature. Its a failure of the standardization process.

    You cannot write portable programs if your basic data types are not defined.

    And it also doesnt give todays hardware any advantage. Alpha uses 32 bit int. Using 64 bit would be pointless since portable programm wont use the added 32 bits.

  195. a point about malloc/free vs. garbage collection by TheDullBlade · · Score: 2

    Actually, there is no absolute requirement that free() does all the cleanup work that the allocator, any more than a garbage collector must call a routine to do all of its cleanup work at the end of every block and at every assignment.

    free() could just add the freed chunk to a list of chunks to be cleaned up. A second thread could run continuously tidying up all these chunks with the same system-wide view as a garbage collector.

    --
    /.
  196. Considerable problems by Anonymous Coward · · Score: 2

    I can see two major problems offhand:

    The first is that java is, conventionally, a
    language which runs in a virtual machine. The
    infamous JVM. This involves a layer of
    inefficiency from the word go. Granted, you could
    avoid this problem by compiling direct for the
    CPU in question, but this is a nontrivial
    conversion from a stack-based language to a
    register-based architecture. If you are on a
    java-optimised CPU, compiling direct, this could
    still be pretty good.

    The second objection involves the compiler.
    Compilers take CPU cycles. The smarter they are,
    the more they take. If you compile beforehand,
    this of course doesn't impact execution speed.
    If, of course, you'd like to compile on the fly,
    well, you're going to have to execute the compiler.

    Java also isn't really all that suited to this,
    to be frank. If you want to see a REALLY powerful
    model for this sort of thing, try using a FORTH
    compiler. On a CPU which is at all stack-friendly
    with a good execution environment you can pull
    all kinds of nifty tricks (on the fly modification
    of running code, assembler-like low level activity
    and so on) which can really speed things up
    immensely.

  197. Re:Java specifics by Anonymous Coward · · Score: 0
    One of the most important optimization techniques is inlining. In OO, you can't inline unless you're sure a call is monomorphic (static). Java is even worse than the general case OO -- a Java program simply cannot use inlining in the vast majority of cases. Here's why: To inline a call in OOP, you must know for 100% that the distatch site is monomorphic (i.e. that the call is never polymorphic)

    You may inline if you surely know that object you handle is of given type.
    Simple example:
    Foo.Bar() -- will be translated as -- if Foo.Type == TypeWeHandle90POfTheTime -- inline Bar for TypeWeHandle90POfTheTime -- else -- call Foo.Bar(); using message passing. That's the way things should be done.

  198. java/jit speed vs asm by TummyX · · Score: 2

    So while it might seem profoundly counter-intuitive can anyone actually give me a good reason why Java + JIT should be slower than Good Programmertm + Assembler?

    There are many things that slow java down, that I'm sure you're aware of, but I'll outline some of them here as they come to me :).

    Limited stack, all java objects are created on the heap, and thus is considerably slower. Microsoft Research actually have a division working on speeding up java by putting some objects onto the thread stack.

    Runtime checking, array bounds checking etc can not be turned off. Good for stability, bad for speed.

    Late binding, can be rather slow :), it's always best avoided when using java, but sometimes you can't. This affects other languages too ofcourse.

    Coding practise. I don't know about many people here, but when I program in C, I usually know exactly what's going in memory, so I spend a bit more time optimizing this and that saving a few extra cycles, especially if the code is in a loop. I'm sure this is even more true of most ASM programmers. In java, I just don't care, I don't really know what's going on in memory, and I don't really care, I just do what makes the tidiest code, I also tend to use inheritance a bit more :) hmmm.

    Also, java's thread synchronization is easy, but extraordinarily slow compared to what you could do if you were to use Win32 for example, since you have a lot more control. A JIT is not going to help that. Come to think of that, that's where JIT can fail with high level languages, there are things that these languages restrict you from doing, which prevent such cool optimizations that a JIT couldn't possibly do - I'm not talking about instruction level optmization, I'm talking about things like being able to manipulate memory directly.

    The idea that Java + JIT can be faster than traditional ASM is interesting, but never will happen I'm afraid :P. It's nice throwing around O notations about how much more efficient garbage collection can be etc, but you have to realise than you're assuming that memory freeing is the most costly process, it may not be. With JIT, GC etc, you have the additional overhead of the JIT and GC. I haven't seen any proof so far that these overheads aren't overheads at all.

    There are additional 'optimizations' that can be done at runtime, but does using a JIT really improve performance (taking into account the overhead of the JIT)? How many cases are there when a compiler can not optimize the code as much as a JIT, and the JIT can actually give any kind of improvement. I doubt there are many, and unless the improvements are like 10% improvements, I doubt it would matter much. What's the diff between a 433Mhz and a 476Mhz celeron? Nothing any average user would notice. The only place you would notice the speed difference is when you're running something like VMWare, 3DSMax or compiling Linux....and in those cases would a JIT improve or just hinder performance?

    Ofcourse, as we get faster, and faster processors, we can move to higher and higher level languages like Java. They drastically reduce development times, and in many cases, will be the better choice. I doubt you'll see Quake or any OS written in Java anytime soon....hell, I doubt you'll see Office or CorelDraw written in java anytime soon ;).

    Crusoe is fast and super cool, but it's not as fast as a real 666Mhz x86. And if Transmeta adds official java support, I'd love to get my hands on one :)~.

    I should also stress here, I have nothing again'st java/jit etc. There're all cool stuff, but I think the concept of Java beating good programmer + asm a bit, uh, unrealistic :).

  199. Transmeta Speed by mohaine · · Score: 2

    If one Transmeta CPU, running the morphing software and actual productive code, is on par with a PIII 500 Mhz, what would the speed improvements be if you used 2 CPUs? One dedicated to morphing, and one dedicated to productive code? Design would be almost as easy as current design, and speed could double.
    Best part of this approach is that it still appears that your system has a single CPU. Win9x will still work.

    --
    (appended to the end of comments you post, 120 chars)
    1. Re:Transmeta Speed by Skratch · · Score: 1

      Unfortunately, the morphing is all done in software....

      --

      -- My neighbors dog has a four inch clit.
  200. FPGAs and overall program objective by ^me^ · · Score: 1

    Speaking of evolvable code, what ever happened to software FPGAs? I'd like to get my hands on some of that... Anyways, I've an idea. Why not include information on the overall objective of the code into the FPGA?

    --
    No one ever says, 'I can't read that ASCII E-mail you sent me.'
  201. Re:Q: Performance of C++ STL vs Java JGL ? by FutileRedemption · · Score: 1

    Thanx.

    Doesnt look too good unfortunately...

  202. Re:Compilers DO write better code than humans by FreshView · · Score: 1

    Actually, AFAIK, Id hasn't used assembly since Quake1. If you're interested Michael Abrash wrote a book called "Black Book of Graphics Programming" which details some of the tricks used in Quale.

    I thought that id stuck to C from Quake2 on. (I'm fairly certain the quake3 code is in C, and not C++, The Quake Source was all C). As far as games go, really the only place to use assembly was the inner loop, or the per pixel rasterization loop. That is no longer necessary due to hardware acceleration (Quake 3 requires it), and I don't know if everyone noticed this, but I found Quake2 software mode to be FAR, FAR slower than Quake's software rasterizer.

    I really don't think you'd gain enough to warrant assembly language in other parts of a 3d engine (bsp traversial, picking the potentially visible set), it's too complicated, the inner loop of the rasterization engine, OTOH, is very, very small, probably 30 lines of assembly.

    Anyway, those are my thoughts, I hope Carmack or Abrash is reading this so they can correct me on any points I missed.

    --
    -------- "All I want in life's a little bit of love to take the pain away" --Spiritualized
  203. Clueless Ranting by signine · · Score: 1

    Okay, admittedly I'm clueless as to the dynamics of JIT compilation, but from my relatively limited perspective, I do have a few comments about this concept of JIT compilers generating better code than a human.

    It makes sense to me that a JIT compiler would generate...ubercode. Of course a processor which is inimately familiar with itself would be able to optimize its activity given a certain situation to fit its own needs. It as if the processor reaches enlightenment.

    However, in the case of a Transmeta processor I wonder if the processor has to generate new code every time a non-native assembly program is run, how fast could it be. Is there some kind of cap on the speed of the processor due to its constant compilation?

    It seems to me, a person clueless in the tao of the processor that a coder, writing for a specific processor, would be able to create code that executes far faster than a processor generating code from a coders code on the fly.

    The degrees of seperation between the coder and the binary should be as small as possible.
    --

    --
    If there is a God, you are an authorized representative. - Kurt Vonnegut Jr.
  204. JIT are compilers by gmarceau · · Score: 2
    Almost everything that a compiler need to do to optimize code structure un undecidable. Human brains kick-ass at undecideble problems. Almost anyting that pertain to instruction scheduling is NP complete. Human brains kick-ass at NP-complete problems

    Once a programmer has infered some property about his program, he needs a highi-level language to express his solution to the compiler. ML's typing system is the best thing I know that acheives that.

    Once a program has figure a fast instruction ordering, he need a low-level language to express his solution in. We all know what kind of language does that.

    But I don't know any languges that combines both.

    As for todays' question : if JITs looks cool, and one time compiles looks boring, think of compiling n-times with a profile session in between and you will get much better result. (all the sugar, no r.time JIT overhead)

    -

    --
    This post was compiled with `% gec -O`. email me if you need the sources
    1. Re:JIT are compilers by scheme · · Score: 2
      Human brains kick-ass at undecideble problems. Almost anyting that pertain to instruction scheduling is NP complete. Human brains kick-ass at NP-complete problems

      People aren't really kick ass at undecidable problems. People aren't able to decide whether certain programs halt or not. If the program is trivial then a computer can decide whether it halts or not probably with the same effectiveness as a person. The undecidability problem just says that there is no algorithm that can figure out whether all programs halt or not. An algorithm may be able to decide whether a certain subclasses of programs halt or not.

      In addition, people most certainly do not kick ass at NP-complete problems. To solve any problem, we need an algorithm to do it and all such algorithms are nonpolynomial. If anything people would be slower at solving these problems since we can't compute things as quickly as computers.

      --
      "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
    2. Re:JIT are compilers by seeken · · Score: 1

      While we probably couldn't solve a problem faster then a computer, maybe we're more efficient about it... We can make better guesses than a computer, we don't have to apply an algorithm blindly, and we can appreciate the whole problem at once..

      Surfing the net and other cliches...

      --

      Surfing the net and other cliches...
      (Who Meta-Meta-Moderates the Meta-Moderators?)
    3. Re:JIT are compilers by gmarceau · · Score: 1
      Sorry, I do assume something here:
      The only admissible attack to NP-complete problem and undecidable problem is estimations.
      This is also what happens when so start coding for one of those problems, you have come up with some estimation heuristic.

      Any other approaches won't realy work for NP-complete problem (not until the arival of the quantum computer at least); even more so for undecidable problems. This is what brain excel at: cooking up heuristics. This faculty even has a name of its own: Intuition. Moreover, if you spend enough time developping your intuition of a subject, you will get very good at focusing your thinking time on the good solution threads.

      I would say this is what realy can out of the Kasparov vs Deep blue chess match. It gave us an idea of how much computation time we save ourselves by being very good heuristic cooks.

      -

      --
      This post was compiled with `% gec -O`. email me if you need the sources
  205. Why not? by MikeFM · · Score: 1

    I work w/ AI and computer language theory both as hobbies and see no reason why code can't be evolved. True, it'd take some effort to do it well but the main point is the more you ran the compiler the better it'd get as it'd learn more and more tricks. First I assume it'd jsut compile a base version.. the same version any old compiler would give you.. then it'd begin trying various optimizations and as it went it'd learn what works best in certain situations and even begin to make it's own optimization combos. After it gained enough experience it could do the job far faster than any human. I've played with this some, mostly in running interp'd languages of my own design and it seems to work fairly well. I've also wondered if you couldn't create something similar for file compression. Might be interesting to see what you get.

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    1. Re:Why not? by ASM · · Score: 1

      Sorry, I did not make myself clear enough. The point is that if by some means (such as those mentioned in my previous post), the compiler learns bad optimization techniques, then it is possible, and even lileky that its optimizations will notn work. Instead, when the compiler tries to optimize code, it will generate binary code that is actually slower/bigger, buggy (code that does not handle every possible input, for example), or even just plain wrong.

      If that occured, then debugging would be even more difficult than it already is. We would no longer have to worry about errors in our own code, but errors in the compiler's output. We would have to decompile/reverse engineer our programs before we could ever know the source of the error- and even at that, disassembling would prove difficult, because we would have to figure out how the compiler mangled our code, instead of knowing what to expect, based on what we initially wrote.

      --
      Fish
  206. GCs can improve performance by josu · · Score: 1
    About your third point. Garbage collection need not be less efficient than manual memory management. GCs have two (at least) advantages:

    • They can avoid costly system calls. Asking the OS for memory and telling the OS you're done with that memory has some overhead. If memory management can be done without making system calls, time is saved.
    • Manual free()ing of memory involves touching every piece of memory one last time just to relinquish it. Some garbage collection schemes like stop-and-copy (and it's more complex sibling, generational collection) avoid this last touch by only manipulating memory that is still in use.
    You'll still get occasional pauses due to GC, but the amortized performance of your program can be better than manually collected code.
  207. Don't take it personally man (nt) by Andrew+Cady · · Score: 1

    x

  208. Re:Self-Modifying Code by Some+Strange+Guy · · Score: 1
    Not at all true... self modifying code is still around not only on JIT compilers but in interpreted languages and even device drivers (Or how do you think some device drivers adjust in real time to hardware changes?). And it's not a dangerous technique in itself, but (As usual) when executed improperly it can be a wild beast (As most people discovered when thy installed their first Cyrix and AMD 486 processors and discovered most x86 code wouldn't run properly because of cache integrity issues)

    Interpreted languages generally don't use self-modifying code. And it's not exactly a common technique in device drivers either. How many instances do *you* see in /usr/src/linux/drivers/*? Adaptability doesn't imply self-modification.

    First of all, Harvard machines require separate instruction and data pipelines and memory spaces, and 99% of the CPU's on the market (General, Embedded, etc.) are Von Neumann machines which use a single space to store instruction and data. The thing is, newer CPU's (Even if they're Von Neumann desings)have separate caches for data and instructions and most self modifying code violates the cache integrity (See above) because the code is modified in the cache but never stored back into memory (Unless you use a write-through cache design which is impractical from the performace point of view).

    This is probably more nitpick than anything else, but it's legitimate to refer to machines with split I/D caches as "Harvard architecture". The original meaning of completely seperate memory for I and D is not terribly relevant anymore and the distiction at the cache hierarchy level is fairly useful. Not many people talk about the IBM Mark series in anything other than a historical context. :)

    BUT (And that's a big BUT :) if your JIT VM forces periodic cache writes to RAM (Not every time but enough times to ensure some sort of coherence if the cache and RAM are out of sync) you lose a bit of performance but gain a stabler setup.

    To be correct across architectures, your JIT has to insure that you'll NEVER see the old icache data after you store to your program memory. The simplist way to do this is to force an icache flush on every write to instruction space. There may be a way of dynamically tracking this and doing some sore of Just In Time flush, especially if the processor doesn't offer any sort of selective flush in the icache. This could probably be done by invalidating a page with a special flag which causes a flush on TLB miss.

    The point is largely moot though. I have serious doubts you'll ever get a consistent performance gain out of updating code on the fly; better to finish execution with a given translation and then go back and tweak the code ordering/generation based on profiling. In that case, flushing the icache is more or less free, from a performance perspective.

    But even run time profiling is dubious at best, methinks. There's very little benefit in this space that can't be gotten more easily and at less cost from a combination of static compiler optimizations and dynamic architecture adaptations (e.g. branch prediction).

  209. Experience from Optimizing Java ... by joe_fish · · Score: 3
    The theory looks sound, but one of the biggest things to do when optimizing any Java program is to minimize the calls to new().
    I recently sped a program up by 150% in a snap simply by killing a few new()s. Swing and LotusXSL have had similar experiences.

    I think that part of the problem is that all of this is new, so there is more to do. HotSpot the trendy JIT from Sun in places IS ALREADY FASTER THAN C, but whenever it comes to Object creation, things slow down a lot.

    So why in theory should new() be quick when in fact it is slow? IMHO the problem is not with the memory claiming, its with all the other stuff that the JVM has to do.

    When I call new Foo() the JVM:

    • Checks to see if the bytecode for Foo already exists, and if not it loads it, verifies it, and calls the class init method.
      This is very very slow, but should only be done once.
    • Allocs new memory. Probably very quick
    • Calls the hierachy of constructors of all Foos superclasses. Quite slow.
    • (For advanced garbage collectors) Place the object on the 'recent items' list. Probably quick
    So I guess the complexity of the system as a whole is the problem here.
    1. Re:Experience from Optimizing Java ... by ucblockhead · · Score: 2

      Yes, but the key is that if you allocate from the heap, you have to search some sort of free memory chain to find something. This takes time. To allocate something off of the stack, you just increase the stack pointer.

      If you do benchmarks, you'll find that this (At least, you would have last I checked, which was with an old DOS compiler):

      Foobar f1(1,2,3);

      is dramatically faster than this:

      Foobar* f1 = new Foobar(1,2,3);

      for the reasons mentioned above.

      This is because the former looks something like this in psuedocode:

      f1 = StackPtr
      StackPtr += sizeof(Foobar)
      f1.Constructor(1,2,3)

      while the latter generates something like:

      Start at beginning of free block chain
      for each block
      is it big enough?
      if so, is it too big?
      if so, split it and return beginning
      otherwise return it

      f1 = returned block
      f1.Constructor(1,2,3)

      Where this really gets compelling is if you a class has member classes which have member classes, if those members are all allocated, you end up with lots of little chunks of memory. On the other hand, if they are all regular members, and not pointers to things that are allocated, they just become part of the parent object's size.

      If you have this:

      class A
      {
      class B {} b;
      class C {} c;
      };

      then this line gets memory from the stack merely by incrementing the stack pointer by sizeof(A):

      A a;

      This is basically one assembly op which, unless there is a page fault, is nothing, timewise. On the other hand, this:


      class A
      {
      class B {} *b = new B;
      class C {} *c = new C;
      };

      Created like this:

      A *a = new A;

      Requires three searchs through the heap to find memory. Not only will this take more time, but this is more likely to generate a page fault, both because you might not find contiguous memory, but also because each heap object has overhead associated with it (the list of free and used memory blocks) while each stack object usually does not.

      (The other reason to prefer the former sort of class is that even if you decide to use the heap for the owning class, you use a bigger chunk, allocating only once for the class rather than for each memory object.)

      --
      The cake is a pie
    2. Re:Experience from Optimizing Java ... by Anonymous Coward · · Score: 0

      I thought that the hierarchy of constructors only were called if you were to specify this by calling super() within your constructor. Is there any reason why all of the superclass constructors need to be called when instantiating a single object?

    3. Re:Experience from Optimizing Java ... by Otis_INF · · Score: 1
      When I call new Foo() the JVM:
      • Checks to see if the bytecode for Foo already exists, and if not it loads it, verifies it, and calls the class init method. This is very very slow, but should only be done once.
      • Allocs new memory. Probably very quick
      • Calls the hierachy of constructors of all Foos superclasses. Quite slow.
      • (For advanced garbage collectors) Place the object on the 'recent items' list. Probably quick

      Well... Don't forget: you live inside your virtual machine. it's not Native Java On Your Pentium System. Creating a new instance of an object is implemented slow in the JVM, perhaps because it doesn't use a heap. If the JVM needs extra money for the object to create it first has to allocate new memory for itself, then it has to assign it to the object inside the JVM. it's two steps where 1 is enough. That's the evil of emulation. An inner loop optimalisation with a JIT can be faster sometimes (I've seen some 'test' programs that should prove a Java JIT is faster than a C compiler. All these programs did was run some for() loop a zillion times.), but only in small area's. (Today).
      --
      Never underestimate the relief of true separation of Religion and State.
    4. Re:Experience from Optimizing Java ... by ucblockhead · · Score: 2

      One huge advantage C++ has here is that you can declare on object on the stack. This makes your fourth bullet go away and turns the second bullet into something like two asm instructions.

      (And, of course, there is no bytecode to load, so the first bullet becomes merely "call the constructor", whether the stack or the heap is used.)

      Even in C++, the heap is overused. With moderately careful design, the majority of objects can be left on the stack. I'm convinced that one of the reasons that people think C++ is so much slower than C is that many C++ programmers don't think about this enough.

      Calls the hierachy of constructors of all Foos superclasses. Quite slow.

      What strikes me here is that part of the problem is these huge object hierarchies that people seem to love so much. A very flat hierarchy reduces this quite a bit.

      --
      The cake is a pie
  210. Sounds good, but... by randolph · · Score: 2

    I once took a course with Dr. Alfred Aho, one of the authors of The Dragon Book and a contributor to lex, yacc and awk (he's the 'a' in awk.) Dr. Aho would never call a compiler component an "optimizer;" instead, he called it a "transmogrifier," because the only way a compiler could produce truly optimal code would be if it could write an optimal program--and it can't.

    While the compiler has the opportunity to rewrite the code every time it processes it, it's not very good at rewriting code, so its abilities are limited.

    A well-designed compiler on a well-designed CPU can produce very good assembler code--I doubt I could outsmart a compiler in a RISC assembler optimization problem--but I believe that code tweaking is usually of value only in inner loops and for certain kinds of data storage.

    All of which adds up to the reality that, pending the arrival of artifical intelligence, human programmers are still needed.

  211. JIT execution time adds up to program exec. time. by Otis_INF · · Score: 1

    A program that uses a JIT compiler can perhaps execute faster AFTER the JIT has ran, but the JIT takes away execution time. The MORE optimalisation you do in a JIT the more time it consumes.
    The weird part is: the JIT's we know today don't store their compiled code. Why not? Ok, some optimalisations are part of the executionpath the program takes based on the actions and reactions of the user, but a lot of stuff is not.

    Interpreted languages have the disadvantage that the result of the interpretation is lost. A JIT solves that (partly). The old discussion about whether a true native compiler can generate code that is faster than the code interpreted and compiled on the fly with a JIT, is a discussion that's not closed or won by any part. This is because there is no scientific evidence that proves any side is right (yet).

    What we KNOW today is that a native compiler can generate code that is, generally speaking, fast on the target CPU. Faster than the interpreted version. We also know that the compilers of TODAY can't always predict what's the most fastest code to emit. the optimalisation schemes are very clever but not clever enough. We know that some optimalisations can only be done at run time when there is knowledge what to avoid and what CAN be optimized.

    It's thus a mixture of both. However, as I said before, the addition of a JIT to a native compiled codepart, as can be done in a CPU, also adds CPU time. It's thus up to the JIT developer to keep the time added by the JIT lower than the gained speed :) This is sometimes the case, in other situations there can be a lot of CPU time lost to the JIT, because code is just executed once or twice.

    Besides the technical aspect: what's really important? if you want platform independance (hardware wise for example), what are you willing to pay for that issue? speed in execution? a high price in software?

    If speed is the most important thing, emulation and JITs are TODAY not an option. Perhaps tomorrow or next year, when there is more information available about what is faster in certain situations: a CPU with a JIT, or a JIT with interpreted code or just a fast CPU with native code.

    --
    Never underestimate the relief of true separation of Religion and State.
  212. Stupid slashdot by Scott+Wood · · Score: 1
    I put a

    between the quote and my text. It showed up that way in preview. Slashdot ate it when I submitted.

    --

  213. dammit by TheDullBlade · · Score: 1

    you can guess what went wrong...

    --
    /.
  214. Java VM is a Stack Machine by happybob · · Score: 1

    One of the reasons it is very tough to make java fast is that the byte code is designed for a Virtual Machine that is a Stack Machine!

    A Stack Machine of all things! There's a reason that no modern CPUs are stack machines.

    Last year, I had to write a java compiler for my compilers class. I was simply disgusted by the stack machine nature of the VM.

    Instead of doing actual, meaningful work, the stupid VM spends loads of time wailing away at the stack, pushing, pushing, popping....

    Stupid, stupid design for anything other than braindead little chips. But wait, that's what Oak (Java's precursor) was designed for. Go figure.

    Once again, marketing has given us a solution in search of a problem.

    scottwimer

    --
    -- Beer. It's what's for breakfast.
    1. Re:Java VM is a Stack Machine by Anonymous Coward · · Score: 0

      Just because the byte code is stack based it doesn't mean that VM is a stack machine! If you lifted up the bonnet of any respectible JVM you'll find it uses platform native address codes and register colouring. The stack based language allows for the code to be very small and easily optimised.

      I need more power for the brain machine!

  215. The C feature with "int" by TheDullBlade · · Score: 2

    The basic data types are defined. You can use sizeof() to generate 100% portable code when you really have to know exactly how bit your ints are. More importantly, you can use your noggin and avoid it being a problem in most cases by writing code that accepts the range of possible value limits.

    Embedded systems programmers will back me up on this one: it's a Good Thing that when you write C you can write one piece of code that's perfectly happy to use a 16-bit, 20-bit, or 32-bit int and it will be equally efficient on any of those systems.

    --
    /.
    1. Re:The C feature with "int" by AndrewHowe · · Score: 1

      That's cool as long as you never want to store a number larger than 32767 or 65535u...

    2. Re:The C feature with "int" by ucblockhead · · Score: 2

      The trouble is that you can't always use sizeof() and you can't always change your types around. The classic case (which I ran into, painfully) is when you have two different machines with two different architectures passing structures around.

      It is easy enough to define a structure like this:

      struct
      {
      int Foo1;
      int Foo2;
      };

      and yes, regardless of the machine, this will compile and run most anywhere, assuming that Foo1 and Foo2 are never larger than 2^15. The trouble is, if you are using that structure two places, one on an 8-bit machine, another on a system that is being upgraded from a 16-bit OS to a 32 bit OS, well, you have pain. (An unfortunately all too real situation I got stuck with.)

      Now in my situation, you can certainly say that the idiot authors should have used shorts (and they should have), but even then, there is no guarantee that a short will be the same size in both places. I foresee a lot of pain when sizeof(long) stops equalling sizeof(int) as we move to 64 bits.

      So what you end up doing is having the structure defined twice, potentially differently, for each side of the pipe. Not fun to maintain.

      (And don't get me started on packing.)

      --
      The cake is a pie
  216. G.N.U. Grope.... by Jonathan+Hamilton · · Score: 1

    I went to the 1998 Atlanta Linux Showcase and Miguel (The most hyperperson on earth.) Was teaching a class about GNU GROPE. It was suppose to be a optimizer for code similar to the SGI rope. I'm not a programmer so I don't understand exactly what it does but Miguel said that using GROPE could lead to a 40 precent memory reduction under gcc. Has any one heard about this? If so what ever happend to it.

    I decided to throw this in after all the other talks on optimization and compilation.

  217. Self-Modifying Code by Tune · · Score: 3

    I believe Andy has a good point in arguing that there's no fundamental reason for manually written assembly being better than automatically self-optimizing stuff. I also believe that manually written assembly will ultimately become obsolete.

    Nevertheless I'd like to point out that JIT is based a somewhat dangerous technique: a program that alters (its own) code. I believe this technique was used in the eighties to scare off hackers by making code incomprehensible and hard to disassemble until the program was actually running. Also (even on a good old 6502 processor) it's possible to make some speed improvents peeking and poking into the code you're actually executing.

    When compilers for Microcomputers got faster and most processor architectures (known as Harvard architecture, I believe) explicitly require a division of RW and RO memory segments, self-modifying code was abandoned. Hacking into your own code is generally conceived as bad-programming-practice.

    I was wondering wheter JIT techniques also require very intelligent (and thus "heavy") disassemblers? One might also expect that developing a JIT compiler is a lot harder than doing a conventional one (without peep-hole optimization). Does anyone have experience in these subjects?

  218. Microkernels are dead? Does Apple know that yet? by Anonymous Coward · · Score: 0

    Nothing wrong with microkernels, the fact that 99% of research was done on the fatally flawed Mach is what did them in.

  219. The whole Transmeta Morpher vs JIT. by Anonymous Coward · · Score: 0
    Ok, I just feel like a few things have to be stated on this.

    (1) I am an assembly language programmer (not 100% or all the time, just that I can when it makes sense for me to do so.)

    (2) I've never met a compiler I could not beat in terms of performance, except in trivial cases or with significant hand holding.

    (3) A JIT sees optimized bytecodes which are not necessarily optmized for any particular target platform. Thus a JIT must necessarily reschedule code from scratch.

    (4) Transmeta translates code which to some extent or another has already been optimized for, presumably, the Intel x86 architecture. Thus Crusoe has less high level work to do, but much more low level work to do.

    (5) Crusoe has a very important hardware feature that allows speculative memory dispatching to work around potential aliasing situations. Thus it will be very important the Crusoe's morphing code take advantage of this feature. This solves a problem that has plagued other CPUs and compilers for years now. A JIT has not analogous thing that it can do in this situation.

    (6) Calls to new() are not the biggest penalty to the garbage collection memory model. Its the ref count maintenance that ends up all over the place. Besides, there is no way that the could lead to a speed up over the malloc()/free() method which could also use a stack and sub-heap method for memory management.

    (7) The Java standard has problems with it that no JIT can solve. The primary example being that the floating point standard is strictly bit compatible with UltraSparcs, but not compatible with either Intel's IA-32 or Compaq's Alpha floating point units. Hence UltraSparcs will always beat x86's and Alpha's at floating point because they have to emulate a Sparc's FPU rather than using their own, and JITs can't do anything about this. But a perusal of www.spec.org should convince you of what a crock this is in native code. Transmeta, should, in theory, not have this sort of problem.

    (8) Java has bounds check on its arrays. A compliant JIT cannot remove this check. Crusoe doesn't have anything comparable that it needs to worry about (it merely has to emulate exceptions properly.)

    Just my $0.02
    --
    Paul Hsieh
    http://www.pobox.com/~qed/mailme.html

  220. transmeta comes with new programming enviroment. by Anonymous Coward · · Score: 0

    i think so

  221. Just wondering by square · · Score: 1

    Can the crusoe chip morph efficiently when, for example, a C program invokes a java program, eliminating the "virtual machine" paradigm?

    HFF

  222. Report interviewed by Overseas Europe by bbeurope · · Score: 1

    Overseas Europe starts its coverage with a Strong Buy recommendation on SFI - Southland Financial Inc. -=(SINC NASDAQ BB)=- it places a target of 25$ by year end 2000. *Website http://www.sincsfi.com/ (Under devlopment) *Outstanding shares Details* SFI has a total of 16,200,000 shares issued and outstanding, of which 3.5 million are publicly traded and 13,100,000 are restricted, *Company Overview* SFI aims to be the premier, independent communications 'network service provider' in South East Asia. SFI will provide cost effective, high quality, value added, last mile qolutions to its customers. SFI opportunities are driven by new technologies including : - Video On Demand (VOD) - Music On Demand (MOD) - MP3 technology, MPMan - Home Shopping and Home Banking - Videoconferencing, Videocasting, Videostreaping, Telemedicine - Interactive Network Games - Internet / Fast Internet Access Products SFI also closed a deal with TEEL (Top Express Engineering Limited) which is expected to be worth $300 million in the next five years in terms of both existing and new infrastructure. SFI recently has applied for the International Gateway License in Hong Kong, OFTA (http://www.ofta.gov.hk/index_eng1.html), we expect news at the end of January, early February. SFI holds the landing rights to Project Oxygen for Hong Kong. (www.projectoxygen.com), (Project Oxygen will be one of the biggest fiberoptic projects in the world connecting 76 ) (countries,Project Oxygen dwarfs any other Line slated for coming into Hong Kong. ) SFI recently signed a letter of intend with CCF STARBRIDGE to apply and enter in the Internet, Intranet and E-commerce area in Mainland China & Overseas (http://www.ccfstar.com/) SFI is presently finishing a form 10 QSB with the S.E.C., In conjunction with receiving an approval to a fully reporting status and the finalisation of project funding, the Company intends to make an application for a NASDAQ listing and is presently in final negotiations with qualified and experienced individuals to augment the board. Further details of which, will be made available shortly. SFI recently signed a mandate with Macquarie Bank (www.macquarie.com.au/oc/) Macquarie Bank Limited of Australia will advise SFI on its project to establisch $200 milllion international telecommunications gateway link into Hong Kong, UTI (United Telecom International ltd) has obtained landing rights in Hong Kong for the $14 Billion global marine cable company, Project Oygen, Macquarie Bank's mandate with SFI includes the requirement of Project Oxygen completing its network into Hong Kong on a timely and cost-effective basis. *Latest Chart http://www.bigcharts.com/quickchart/quickchart.asp ?symb=sinc&sid=0&o_symb=sinc *Latest News Releases http://biz.yahoo.com/bw/000118/southland__1.html http://biz.yahoo.com/bw/000112/southland__1.html http://biz.yahoo.com/bw/991202/southland_1.html http://biz.yahoo.com/bw/991104/southland_1.html *Board of directors & management David A. Turik, President and Chairman of the Board Mr Turik brings a wealth of telecommunications expertise to the Company having previously held senior executive management positions with Australian publicly listed companies, NetComm Australia, Telstra, (the former national carrier, Telecom Australia), AAP Telecommunications and Spectrum Network Systems, as well as Toronto Stock Exchange listed, TSB International. In the past 5 years, he has predominantly focussed on Business Development, Mergers and Acquisitions and project specific capital raisings. Mr Turik has held an interim role of acting President and Chairman for the Company and was responsible for identifling and negotiating the Company's telecommunications infrastructure based project in Hong Kong. Mr Turik has held many private board appointments across a wide range oftelecommunications related organisations. Martin Dougherty, Non Executive Director Mr Dougherty has had a long and distinguished career in the media and communications field having held senior executive positions with both the John Fairfax Group and News Limited. Mr Dougherty has also held Board Directorships with the John Fairfax Group, the Australian Associated Press, (AAP) and David Syme Pty Ltd, (publishers of the Melbourne Age and Business Review Weekly). For over 17 years, Mr Dougherty has also been Chairman of Dougherty Communications; public affairs consultants, specialising in strategic public relations, issues management, telecommunications, Government and media relations. This company became a member of the Ogilvy Public Relations Group. Mr Dougherty was a member of the International Management Group of Olgilvy Public Relations during the 1980's. He resumed private practise as a consultant in 1989. Willie Lo, Executive Director, Chairman of United Telecom Inc, (Hong Kong) Mr Willie Lo is a veteran of the Telecommunications and Information Technology industries. He has held senior technical management positions for over 18 years, with Hong Kong Telecom, Telstra, (Telecom Australia) and PRACOM Pty Ltd. Mr Lo has successfully managed a broad range of projects from cable network roll-outs to systems development and implementation. For 8 years, Mr Lo was a Project Manager for Hong Kong Telecom, managing the deployment of major cable networks. Mr Lo was the IT Manager, Media & Broadcasting, for Telstra, prior to becoming IT Manager for PRACOM and an Executive Director, for Pacific Communications Research P/L. Mr Lo holds a Masters Degree in Systems Engineering, (RMIT), Diploma in Business Studies, Hong Kong Polytechnic and Bachelor of Applied Sciences, University of Melbourne. Robert Talbot-Stern, Non Executive Director Mr Talbot-Stern, B.S. Econ. (Wharton), J.D. (Penn.), LL.M. (London), has had a distinguished corporate, academic and public career, having been Group Counsel for UNISYS and Assistant General Counsel for Chrysler. While at Chrysler Talbot-Stern held Board directorships with Mitsubishi Motors and Peugeot, while coordinating Chrysler Canada's rescue effort. Mr TalbotStern is on a current White House Task Force on Deregulation and Competition, has been a guest columnist for the Australian Financial Review and business commentator for CNBC-TV. He has worked as a Management and Legal consultant in affiliation with consulting and law firms in Washington (National Academy of Sciences, McKinsey & Co., Herzfeld & Rubin, and Boston Consulting Group). He has been a past adviser to the Business Roundtable and Federal Reserve. * Comment of Overseas Europe Staff Note that : - China population 1.850.000.000 (1998). - Hong Kong population 8.800.000 (1998). - ITU (International Telecommunications Union) forecast global telecoms industry to be 1.5 Trillion USD by 2000-2001. - The recently signed WTO will have a very postive impact for SFI on long & short term. SFI will announce a new PR company who has a very good reputation on Wall Street. Overseas Europe believes that the company is way undervalued and will update your further on upcoming events and news releases. For more info on SFI send a email at overseasii@yahoo.com.

  223. Evolve code. by Anonymous Coward · · Score: 0

    Compilers, programers, coders, and physicists and all interested in private communication need to look at the 'Information to Data Compiler' dialogue of Fox Forums: Fred Langa; Techno File: Physics.

  224. NNNG! "that the allocator" . " would do" by TheDullBlade · · Score: 2

    I hate when I skip words.

    --
    /.
  225. Languages that accept "hinting" by jekk · · Score: 1
    What languages/compilers DO support this? It's quite obvious why c/c++ don't allow it... you're practically writing in machine language. Likewise for Java. But something like perl, or smalltalk doesn't permit it either. Neither, so far as I know, do languages such as Eiffel and CLU (academic & experimental languages in most people's eyes). Prolog does, but that seems to be a sort of a special case.

    So if these languages don't support hinting on a language level, what does? And why don't they?

  226. Cause or effect? by DragonHawk · · Score: 2

    The point is that the average application has inflated enormeous over the years.

    So has the average machine. Which is cause and which is effect? Do we have bigger applications because with today's faster processors and bigger memories, throwing more hardware at the problem is cheaper then trying to bum five bytes out of a 50 KB program?

    Something to think about, anyway.

    --

    dragonhawk@iname.microsoft.com
    I do not like Microsoft. Remove them from my email address.
  227. I think Linux does this by DragonHawk · · Score: 2

    Perhaps another malloc comes along wanting the same size (roughly) as the thing just freed. We just give it the recently freed chunk. We then don't have to build a new one.

    IIRC from what I have read about the Linux kernel, it does something very like this. Blocks of free memory are kept track of in a hash table, where the hash key is the size of the block. When you free a block, it gets merged with any adjacent blocks and put in the appropriate spot in the hash table.

    Or it might have been an array of lists of blocks, where each element in the array represents a list of blocks twice as big as the previous element's list. It's been awhile; I forget. :)

    --

    dragonhawk@iname.microsoft.com
    I do not like Microsoft. Remove them from my email address.
  228. It's all in the implementation by Steeltoe · · Score: 1

    As all interesting questions this is a matter of how things are implemented. JIT isn't automatically faster than one-time compilation. Without optimization based on runtime data, platform information and profiling, JIT will be slower than compiling by default. Because JIT includes a significant overhead by converting, profiling and optimizing the program. Anyone who has compiled some big programs knows what kind of overhead we're talking about here. Especially with full optimization, there are worlds between time-consumption figures.

    My proposal for JIT compilers is to _store_ and rerun the bytecode and profiling information for later. It should be possible to further optimize already compiled code, especially in much-visited code. So that over time, you can spread the optimization efforts to when you have CPU cycles to spare. As a result first-time execution will not be as painfully slow as it is now, and you will still appreciate fully optimized code after a while, yes, even better than compiled code. This really makes sense with runtime profiling.

    For the best average effect in every scenario, I would recommend an automatic (perhaps weighted?) hybrid between interpretation, JIT and compilation. Also, leave the final choice up to the users of the system.

    For all I know, this might already be implemented in some JIT compilers. Like I said in the beginning, it's all in the implementation. The use of hardware in Crusoe seems very promising for JIT, runtime profiling and platform-independence in general.

    For games, I would recommend an ability (choice) to precompile Java programs and store the entire bytecode. Just because I meet a new type of monster in JavaQuake shouldn't mean my demise (unless I forget to strafe).

    - Steeltoe

  229. Nothing new... by Anonymous Coward · · Score: 0

    You're absolutely right. All of these ideas are already being used in Sun's Hotspot runtime for Java. There are some academic papers which claim that properly-tuned Generational garbage collection can be faster than allocating storage on a stack. It will just take a little time to see if commercial systems can replicate this. IMHO, it's silly to be so worried about speed. For many applications, the time it takes a programmer to write something is much more expensive to CPU time, and anecdotal evidence suggests Java programmers are 40%-100% more productive than C programmers. Game Boy Camera Gallery and Tutorial

  230. Good Point by Anonymous Coward · · Score: 0
    These are good points and supported logically.

    Unfortunately, they are somewhat counterintuitive and go against certain popular beliefs. Don't you know that Java is slow (even with a JIT) and will always be slow? You will after a few hundred flames.

  231. A Transmeta Java Chip is pointless by FutileRedemption · · Score: 1

    The only reason why Transmeta do the translation transparently is the instant support for the x86 OSes this gives.

    For Java you dont need this. A full blown Java VM application gives better Java performance, because it basically does the same, but with less system-specific constraints.

  232. BUMP up this post's rating by Anonymous Coward · · Score: 0

    well made points.

  233. Bless you! by aheitner · · Score: 3

    @#$% theory.

    I can write fast C (or C++ if you stay away from the evil slow features) code.

    Everything I do in Java is bloody slow.

    I do believe fundamentally that some sort of per-machine compilation is in fact ideal -- the C compiler could do clever stuff if it new the exact cache architecture of your machine and compiled your programs from some intermediate (say parsed and desymbolized) representation when you ran them (or maybe just the 1st time you ran them).

    (all the slackware people in the crowd say "bwaahahahaha! I did compile my programs for my own machine!" :-)

    But I think the dumbness of Java -- the funky GC, the dumb dumb dumb RTTI, will always make it slow. For that matter, Java uses a dumb fake-o instruction set that's not a particularly good representation of the information needed by a compiler. Why not give your JITC something closer to the source rather than a binary for a machine that doesn't even exist ?!?

    ...

    Ditzel said something very significant when he introduced Crusoe: (i'm paraphrasing) "The reason we haven't talked openly about this before is we didn't want to boast before we could prove we've done it". Ditzel stood there and said "We've got a chip that does on-chip JIT architecture translation, and does it to run as fast as any other chip out there, and here's the proof".

    The Java people have been claiming for a long time that they could write faster code than the C compiler with JITC.

    Java has gotten better than when it started, it's true.

    But it still can't hold a candle to gcc -O2.

    My message to the Java world: I spend too much on my computers to waste time running slow code. Call me back when you can actually demonstrate your claims.

  234. Limitations of compilers/translators by stevelinton · · Score: 1

    The basic reason why compilers are slower is a programmer can use information that is simply not in the program in their optimization.

    In one (in)famous example, a quick, but, in general unsafe conversion was used in part of the Ariane 4 control software, justified by a mathematical formula present in a comment which proved from the geometry of the spacecraft that it was, in fact always safe. [ sadly, Ariane 5 had different geometry.....]

    A more down to Earth example is a loop that you may know will never execute 0 times, which would allow you some optimisation.

    Profiling JIT compilers can do somewhat better, in that they can, at least, note that condition X has never happened to date, and produce code that handles it as an expensive exception, while concentrating on keeping the main path fast. The Transmeta stuff seems to do a lot of that.

  235. MIT compilers do bitwidth analysis. by cananian · · Score: 1

    This was actually a topic of my master's thesis. More information on my FLEX compiler project. Both the RAW and the FLEX groups here at MIT have compilers that do bitwidth analysis.

    --
    [ /. is too noisy already -- who needs a .sig? ]
  236. Only when you can rewrite the whole thing by p3d0 · · Score: 1

    The argument was that compilers can be better because they can rewrite the whole thing each time. Ergo, if you are also willing to rewrite a given piece of code whenever certain conditions change, then you can beat the compiler.

    However, for situations where you can't change the code every time anything changes (eg. it's too big), my bet is on the compiler.

    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  237. No, that's not it... by p3d0 · · Score: 1

    No offence, but that's a really lame memory management scheme. Nobody would ever actually implement free() that way.

    My guess was that Andy was referring to data flow analysis techniques that can determine that an object is never used once a certain function ends. Then, the object can be allocated on that function's stack, with very low overhead.

    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  238. One thing is missing from the discussion. by anatoli · · Score: 1
    The thing being that Transmeta's technology involves optimistic compilation. That is, the compiler (morpher) can happily assume no errors will happen during the execution. If errors do happen, it will just rollback and re-execute the whole sequence (procedure or whatever) without optimistic assumptions. Note that any infrequent condition can be called "error".

    To match this performance a traditional compiler would have to compile everything twice, once normally and once optimistically, thus doubling the executable size. Of course the CPU must have this rollback capability in order for things to work.

    Moderate this down (-1, Just Plain Stupid)
    --

    --
    Industrial space for lease in Flatlandia.
  239. re-compile the compiler by milliyear · · Score: 1

    And to take this argument to it's logical (for me) conclusion:

    The best setup would be to constantly re-compile the compiler, and for that matter, the kernel, too. (NOT!)

    Compiling once for a given hardware set yes, but not re-compiling for every execution. I'm guessing that somewhere in the neighborhood of 50% to 90% of the compiled code WOULD NOT CHANGE from one recompile to the next, and I just don't think you'll be able to make up for that kind of wasted cycles by optimizing the rest of the code.

    However, if you do, I'll be the first to write you a check!!!

  240. Re:Java specifics by Anonymous Coward · · Score: 0

    You are wrong about the inlining. Sun's Hotspot VM can do "optimistic" inlining and rewrite the code if classes are loaded later.

  241. Does not add up by Tune · · Score: 1

    It really does not matter whether this preprocessing / VM translation is done in software by a very fast RISC like processor or in microcode, as in a Pentium-like architecture. Technically I do not see a fundamental problem here.

    There is a practical BONUS, though: since the interpretor is not in the actual silicon, you don't need to worry about buying a chip that possibly has lots of small bugs that need program patches and workarounds for every single program you bought!

  242. Doesn�t work in practice by FutileRedemption · · Score: 1

    Its wrong to have the most basic data type wobble over the bit length.

    Most people dont think about this when writing their programs. This is a failure of C, because programming languages are for the people.

    The solution would have been to have the basic data types with fixed size, and additional data types with "some size between 16 and 64 bit giving the optimal performance on the target machine".

    This would have saved many people a lot of grief...

  243. Re:Isn't this REALLY the smarter compiler arguemen by MikeBabcock · · Score: 2

    One of the areas that run-time profiling adds to the mix is the ability of the code to adjust "itself" based on processor cache sizes, etc. If I run a program on my K6-200 and then on a K7-700, the program should have loops optimised very differently. That is, my chip has very little cache in comparison to the K7. The K7 could have a large number of long instruction sequences unrolled that would be in L1 cache whereas it would be smarter, if branch prediction works out (again, run-time profiling) to keep those loops rolled up on my CPU.

    --
    - Michael T. Babcock (Yes, I blog)
  244. Sounds like Lagoona, only less sophisticated. by treyb · · Score: 1
    If you're interested in some leading edge dymanic code generation, take a look at Lagoona. To get the good stuff, you'll have to download and read the PDF files and poke around the site a bit. Here are the highlights:
    • The "object" file format is a platform neutral semantic dictionary encoding that is translated at load time to native code.
    • Doing this allows them to insert arbitrary features into the generated code, like profiling, array bounds checking, etc.
    • The first translation is "quick and dirty", with profiling enabled.
    • Once hot spots are identified, that code is "recompiled" with higher optimization.
    • Because they use a garbage collection system and have complete access to the program semantic information, they are able to dynamically re-order data structures to reduce cache misses!

    All in all, quite interesting work, and quite worth the read.

  245. compiler smarter than humans? by rmstar · · Score: 1

    I dont think so.

    I have not seen one compiler that is able to optimize better than a human. I've worked with some compilers, have looked at the assembly code they generate and have written assembly code myself, both for Intel based systems and for MIPS Rxxxx.

    The problem is that compilers have to generate code defensively. That means that they have to add code that enforces the laws of the higher language. Also, they have to follow clear guidelines with respect to register usage, parameter passing etc. that I as an assembly programer can change according to the situation. A compiler cannot behave that way because for this it has to see some bigger picture, and this is beyond it's abilities.

    I think you completely underestimate the difficulty of writing fast code. JIT and java stuff will allways be inferior in performance to serious code (assembly or not) written for serious processors by serious programers. The best human-machine combination is still this one: the human does the thinking, and the machine limits itself to be stupid.

    my view, anyway.

    rmstar.

  246. code optimization by Anonymous Coward · · Score: 0

    To start with, I beleive that Transmeta does not actually gain speed by code morphing, but it simply allows them to use a smaller chip. If I remember correctly it takes a 700mhz transmeta chip to compare with a 500mhz intel chip. This is why the chip is used mainly for laptops. Secondly, I don't think that a compiler will necessarily always be able to do better optimization than a human. I might be biased from programming the early VGA cards (and the fact I was using BC3.1), but there are many non-standard approaches to optimization that simply cannot happen with a compiler -- it requires knowledge of what the programing is trying to attempt. For example, a compiler will certainly not want to use the MMX floating point operations for everything, but for very time critical/protected areas you can get huge time savings (if you are using an 8086 of course). Thirdly, JIT will always have a longer start time. There is no way around this. Think of how long it takes to compile the Linux kernel. Imagine if it needed to compile _every_ time you rebooted. In theory, compile time cannot be made too much faster, especially if you start adding optimizations. The only reason why JIT is useful to _anything_, is that Java is so slow (as are almost all interpreted languages), that the time it takes to compile the byte code to machine specific code is worth it. And, you can write platform independent code (theoretically). However, I think computers are fast enough that hand coded optimizations are generally not needed for most applications. At the same time, try imagining some realtime embedded system (or again, the Linux kernel), where architecture may or may not be standard, and it cannot afford compiling anything anytime. Also, if you have ever tried writing an interrupt routine (at least in the old days), you were basically forced into writing in asm.

  247. Java specifics by Tal+Cohen · · Score: 2

    I don't want to argue about the general case, but as things relate to current-technology Java, JVMs and JITs, here are some things to consider:

    a. One of the most important optimization techniques is inlining. In OO, you can't inline unless you're sure a call is monomorphic (static). Java is even worse than the general case OO -- a Java program simply cannot use inlining in the vast majority of cases. Here's why: To inline a call in OOP, you must know for 100% that the distatch site is monomorphic (i.e. that the call is never polymorphic). But in Java, even if the compiler finds out that a certain method is never replaced by inheritance (overriden) in any class in the program, the program could load new classes at runtime (using Java's Reflection mechanism), thus turning monomorphic call sites into potentially polymorphic ones.

    (A few exceptions to this rule: first, there are final methods, or methods in final classes, that cannot be overridden. Second, there are Sealed JAR files, in which all package-level (default access level) methods cannot be overriden. IBM's JIT takes advantage of that, a fact which allows for great optimizations in the standard Java libraries.)

    b. Sun's JVM (the one for Windows, at least) implements references by a dual layer: the value stored in the reference fields is an index into an array maintained by the JVM, which contains the actual pointer to the memory block. This allows the memory manager to move blocks around without having to update all references to that block, but it also means each dereference actually requires two dereference operations. So when you write "this.field1.field2", you think you're making two dereferences but actually the machine makes four.

    c. The JVM is a stack machine. This means that naive implementations on register machines result in slow operations, that simply do not take proper advantage of the native architecture even when compiled to native code. Microsoft was the first to realize this, and indeed their JVMs were the fastest at the time (later on IBM's JVM took that trophy).

    Finally, a word or two on the general case: anything a JIT can do can also be done by a regular compiler, if you allow it to use profiling data from a few "typical" runs of the program (but note that not all programs have "typical" runs). The JIT, however, consumes memory at runtime, unlike a regular compiler - and with programs that require more memory than is physically available (practically all programs today), extra memory requirements reduce speed since they increase paging; so the JIT has its price.

    --
    - Tal Cohen
    1. Re:Java specifics by mch · · Score: 1
      But in Java, even if the compiler finds out that a certain method is never replaced by inheritance (overriden) in any class in the program, the program could load new classes at runtime (using Java's Reflection mechanism), thus turning monomorphic call sites into potentially polymorphic ones.

      Assuming that dynamic class loading isn't very common, you can just recompile when it happens.

      The JVM is a stack machine. This means that naive implementations on register machines result in slow operations, that simply do not take proper advantage of the native architecture even when compiled to native code.

      It would have to be a very naive implementation. The Gosling property of the JVM makes it close to trivial to rename the stack locations to registers.

  248. java JIT itself is not that slow. by monac · · Score: 4

    As shown at http://www.idiom.com/~zilla/Computer/javaCbenchmar k.html, even the linux jdk, which is relatively slow than other platform's, runs as fast as natively compiled C program when runs simple operations.

    In my understanding, java's poor performance is due to following factors:

    1. too generized api design resulted in very deep call stack. for example, printing current date and time using java.text.DateFormat uses much more calls than traditional C (a system call and a printf is enough for C). It gets more terrible in swing. use of interfaces causes the same problem i think.

    2. JNI is too slow. (can't understand why but it is known to be slow and java can't help but using many JNIs)

    3. as WORA is important in java, it is hard for java to use any platform specific resources such as Graphic card's accelerations. that is one of the reasons swing's design is getting ugly as it tries to boost its performance.

    4. too many small class files result in high I/O usage when JVM loads classes.

    5. no MACRO! for example, most of java API is synchronized, which is expensive, reguardless thread is used or not. the same for security check codes and platform specific implementation codes,
    etc. these problems can be solved by MACRO but...
    someone in sun doesn't like macro and i agree with him when think of java's purity.

    some of these problems are known to be solved in next jdk version(hotspot client version). slow synchronizations and class loading speed, etc.
    but mostly, they are java's design issue and never be solved, as java is mostly running in virtual machine and originally it is designed to be used for java chips, only when JVM is implemented in processor level, the speed problem will be solved.

    I'm looking forward to seeing java running in MAJC or Transmeta's chips.

    please correct me if i'm wrong in some and forgive me for my poor english.
    Cheers.

    --
    -- Y. J. Chun
  249. What about chaching? by Tune · · Score: 1

    Of course JIT takes some time to start up. But when a program is started next time, a smart OS (environment, etc. what you call this) can aways use a copy of a previous run.

    This improves time efficiency at the cost of space efficieny. Wrt. effectiveness: the original byte code (half-product) is "processor-independent".

    This leave much more space for speed improvent at the hardware level.

  250. Ooo, ooo, start on packing! by TheDullBlade · · Score: 2

    Seriously, when you communicate over a inter-machine connection, you have to comply with the communication standard.

    If the communciation protocol is, for example, byte based and you are writing portable code, you have to have a formal, portable method of chopping up that struct into bytes at the sending end and building it back up again (from bytes) at the receiving end. It's called serializing, and it's disgusting laziness to skip it; if you do, you deserve those problems, and also to get bleeding genital warts.

    Just dumping the bits of a fuzzily-defined internal data structure outside of the comfy innards of your own program is the ultimate in programmer carelessness (well, dividing by a null pointer and writing to the dereferenced quotient, with appropriate casts, is the ultimate in carelessness, but you get the point). Would you also send your pointers across a network and expect them to work at the other end?

    --
    /.
  251. Theory and practise by Anonymous Coward · · Score: 1

    In theory, a perfect compiler will outperform any human programmer writing hand-optimized assembly. In practise, a real compiler will outperform assembly code that is written by an incompetent programmer (who, for example, doesn't understand all the important details of the target architecture). Too bad many people writing assembly today seem to do a worse job than a decent C compiler.

    Jukka Ikonen / jaikonen@nospam.cc.hut.fi

    However, I'm pretty sure there are currently NO perfect compilers around and there won't be one for a long while, if ever. A GOOD assembly programmer who writes code for absolute best possible performance, not caring at all about maintainability, will outperform current compilers most of time. This happens at least on the most popular architecture of today, Intel x86. Speedups of 100% or more by tweaking gcc-generated assembly output is not uncommon, and VC++ is not much better.

    x86 is probably the most difficult mainstream architecture to optimize for, especially for a compiler (there being only 8 general purpose registers is the most significant reason). If a (JIT) compiler could use run-time profiling information to put emphasis on optimizing the hot spots, the performance would obviously improve. For example, variables that aren't used in the most likely execution paths can be stored in stack, freeing registers for more important uses.

    However, current JIT compilers typically don't outperform traditional static compilers. In order for them to be practical, they cannot perform very complex optimizations, since it would cause long time delays seemingly randomly during execution. Tight loops could be optimized well, but complex global optimizations are probably out of the question. In addition, Java's security model makes it practically impossible to get C-like performance in general. Especially clever (ab)use of pointers, casts and unions give a C programmer a clear advantage. Of course, a JIT compiler could be written for C, but with current compiler technology, it would be a lot slower than static compilers.

    Copying garbage collectors are asymptotically faster than ordinary free()/malloc() combo. But in practise a good programmer uses custom memory allocation routines if performance is important. Actually, it is possible to use a garbage collector from C or write a custom garbage collector optimized for a certain task. If the programmer is sufficiently skilled, he/she will always beat a general-purpose garbage collector. Besides, an incremental/generational (read: fast) garbage collector requires read/write barriers that may cause a significant performance hit.

    The lesson is: When talking about performance, theory does not count. Existing compiler technology, programmer skill levels and the time spent on optimizing code dictate real-world performance, not what a hypothetical perfect compiler could achieve.

  252. The "C is not portable" myth by Bj�rn+Stenberg · · Score: 2
    > You cannot write portable programs if your basic data types are not defined.

    This has to be one of the most widespread, and possibly destructive, myths about C programming. The truth is, if you write code that depends on the size of your basic types, your code will be less portable. The next generation machines will always handle your types in ways you will fail to predict.

    > And it also doesnt give todays hardware any advantage.

    Have you ever worked with a PowerPC processor? Some of these cores do not have 16-bit integers, only 32-bits. Yet, lots of people reasoning like you use them anyway (because "16 bits is enough for this value"). This forces the compiler to insert 16-bit emulation code, thus increasing code size and decreasing execution speed. Had these people ignored the size of their integers and just used 'int', the code would have been both faster and more portable!

    Take a look at a few of the most widely ported software packages around. How many of these redefine the types in order to "know the size of their types"? Not many, I can tell you.

    Writing portable code has little to do with worrying about the size of your integers, and lots to do with not trying to optimize your code to whatever system you happen to write the code on (by, for example, worrying about what the size of your integers are...)

  253. Its the libraries stupid by joss · · Score: 2

    The difference between theory and practice is that in theory there is no difference whilst in practice there is.

    First off JIT is not java specific. Transmetas chip does a form of JIT on everything - even x86 assembly code. Your basic argument that JIT can be efficient is correct (not more efficient than humans, but I'll come to that).

    The performance hit from Java comes far more from it's crappy libraries (Swing anybody) than the JIT aspects. Bounds checking, garbage collection, and inherent overhead of OO techniques all cause a performance hit, but the real trouble comes from crappy libraries written by crappy programmers who put 10 levels of indirection between a request and it's implementation.

    For instance the String class being written with 16 bit characters to make it unicode friendly when 98% of the java code written never uses this feature but still requires twice the time and space for every operation. Also Strings are not alterable so to alter a character you create a new StringBuffer object, copy everything, make an adjustment, then replace the old reference with the new so the old object eventually gets passed to garbage collection... all compared to
    buf[i] = 'a'; (C or C++). It's crap like this that makes Java seem so slow, not the JIT hit.

    Your argument about how JIT can outperform programmers because it has more information is flawed - any decent programmer will know where the spikes are and optimize accordingly. He will know what he can and can't do with the data, whilst the compiler has to take the strictest possible interpretation. Advanced optimization is too slow to do on the fly (although you can optimize the byte code to some extent with a decent javac, then optimize the translation with a JIT)

    Garbage collection argument is silly too. In C/C++ or assembly, the programmer has a choice as to whether he uses the stack or the heap. It makes his life more complicated, but as long as he knows what he's doing it's inherently more efficient. If he doesn't know what he's doing then he's useless in any langauge. If you want to, you can use a garbage collector in assembly or C++, but it's not the only technique available to you.

    Basically Java is slow for the same reason that Windows sys admin is slower than Linux sys admin.
    You don't trust the user with the rm command or trust the programmer with pointers - show him an idiot box if he tries to delete a file or bounds check that array access for him. It's one way of doing things, but it sure isn't efficient.

    --
    http://rareformnewmedia.com/
  254. that's what longs are for by TheDullBlade · · Score: 2

    And if you need really big numbers you can always code your own BigInt structs, though you might want to write a little preprocessor to do that (or use C++ and redefine all the ops for it).

    --
    /.
  255. Crusoe == Hotspot by anonymous+loser · · Score: 1

    Naturally, on the first iteration, crusoe will behave almost exactly like a JIT compiler, but adaptive optimization is another story. I would present the case that crusoe is more like Sun's HotSpot technology. If you are interested in the details, here is a lecture presented at JavaOne about java optimization, which talks about the adaptive run-time optimization in HotSpot.

    1. Re:Crusoe == Hotspot by Obsequious · · Score: 1
      Yahoo! Thanks -- I was waiting for someone to say this, and if I didn't see it, I was going to say it myself. :)

      A JIT Compiler is the brute-force approach. It's almost sort of a stop-gap technique, directly speaking to the complaints that Java is slow. Like any knee-jerk reaction, it's not a really great fit.

      The basic concept of a JIT Compiler is to take interpreted bytecode and compile it to native code on the off chance it might run faster. Sometimes this works, sometimes it doesn't. By the nature of what they do, server-oriented tasks get more benefit from JIT compilation than GUI apps.

      Server apps tend to spend more time in one method call (e.g. the service() method of a Servlet, which sits there spitting out HTML) than do GUI apps (which, according to the event model, jump from method to method handling requests.) Since GUI apps spend very little time in one method, it doesn't gain you much to compile a particular method.

      Hotspot, however, is more than a JIT compiler. It acts as a higher-level "intelligence" that monitors the execution patterns (i.e. profiles the code) and only bothers to compile the methods that would benefit from compiling. Thus, while HotSpot would spend more time reasoning out which methods need to be compiled, it will overall be more efficient, since its time will be better spent.

      Crusoe's code-morphing stuff is essentially a hardware implementation of something vaguely like Java's hotspot. It serves the same purpose -- figure out what code blocks would gain the most from being compiled and then compile them.

      The difference is that where HotSpot just finds "hot spots" in the code and invokes a generic compiler for the particular architecture, Crusoe has more knowledge of the VLIW core and can do better optimizations.

      I'm sure there's also a lot we don't know about the details of Crusoe. There's got to be stuff they don't want the competition to know. From reading the white papers, I still have only a cursory understanding of what their code-morphing stuff is and does.

      So I would say that comparing Crusoe to JIT compilation is like comparing a Ferrari to my Malibu. Comparing Crusoe to HotSpot is probably more like comparing a Ferrari to a Camaro.

      - Dan

  256. Re:a point about malloc/free vs. garbage collectio by ucblockhead · · Score: 2

    Or maybe it only bothers to do this when resources get tight. Perhaps another malloc comes along wanting the same size (roughly) as the thing just freed. We just give it the recently freed chunk. We then don't have to build a new one.

    Caveat: I have no idea if this makes sense, performance-wise, in the real world.

    --
    The cake is a pie
  257. Where did this article get its name? by p3d0 · · Score: 1

    Did I miss it? Where was it argued that Code Morphing is not the same as JIT compiling?
    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  258. Q: Performance of C++ STL vs Java JGL ? by FutileRedemption · · Score: 1

    Does anyone know about such a comparison?

    Or, alternatively one between STL and Suns collection classes?

    This would be very interesting and helpful, since this may be the only roadblock for using Java in a critical project right now.

    If anyone has an URL, please put it in an answer to this post.

  259. Compilers DO write better code than humans by Kvort · · Score: 1

    >As long as the programmer has more knowledge than the compiler, he will
    >always find tricks to save an instruction here or there and outperform
    >the compilers this way.

    This is true, on a small scale. If you want to solve any kind of a decent-sized problem, however, you are not going to want to implement the entire program in assembly. Even if you're not writing a GUI program, a large program would be nearly impossible to implement. Looking at a decent sized C++ program, how much duplication and cruft do you see? Imagine how much more there would be in an assembly program.

    As a real-world example, I believe that Quake3 uses MOSTLY C++ for implementation. Only the really intensive, often used portions of the code are implemented in assembler. I wonder if Carmack reads /. to correct me on this... :)

    Its worth spending twice as much time optimizing an algorithm that's going to account for 90% of your CPU time, but not otherwise. And only graphics apps really have these sorts of algorithms. In a GUI program, most of the CPU cycles are spent passing messages around the system, or in OS calls, which are also generally message-oriented. That would be code worth optimizing, maybe. In the application I'm working on, the most CPU intensive algorithm we have is a FFT, and I would bet that its probably the place where the least optimization could take place. The slowest part of our code (the part that uses the most CPU cycles for the least accomplished) is the part that writes the data out to the screen (After the FFT; Damned third-party controls). It takes approx 2 seconds to run a hundred thousand points through the FFT, and probably 20-30 seconds to display it. The extra time is taken up by GUI, message passing, and COM/OLE crap.

    The point is that only the most CPU intensive, high-performance algorithms really need to be written in Assembly. And most programs don't have these needs.

    >>>>>>>>>> Kvort

    --
    -Don't mind me, I'm personality-deficient and mentally-impaired.
  260. WARNING: nit picking a metaphor by Anonymous Coward · · Score: 0
    I could go on, but it feels like trying to describe that water is wet and stones sink in it to someone who thinks it's intuitive that water should sink into stones.

    <nit picking> A glass of water poured into a bucket of stones will sink, just as a glass of stones poured into a bucket of water will sink. Furthermore, while stones do not sink into molecules of water, water can seep into some stones. </nit picking>

  261. Match code frag. as opt. tech. (Was:Evolve code.) by Anonymous Coward · · Score: 0
    Generally I think that all that has to be done to make a compiler better or almost anything better is to increase the ammount of brute force checking. For example have a bunch of conditions that in effect say that if you have this and that instructions in C/C++ then you will have this output which can be created in assembly.

    I seriously doubt, that you have ever written any kind of code for a compiler!

    1. Please give an example on the type of code you'd want to optimize, any reasonably simple (small fragment of )ecode that doesn't depend on advanced integer arithmetic should compile to something as fast or faster than you would possibly write yourself... Please include example.c or .cpp, GCC input, GCC assembler output and your "better" ASM output.
    2. The technique is often used on small "windows" of asm-code at a stage very close to code-emission in compilers, and is known as "peep-hole" optimization, it is quite successfull on a very small group of problems -- and very expensive in terms of compile-time.
    3. What would really make the stuff go faster is:
      1. Invartiant detection, using compilers to find invariants in the code, and then deduce lemma's which may remove bounds-checks, type-lookups, casting or flush of registers to memory.
      2. Partial evaluation, make several instanses of the same code, whith certain parts of the code already evaluated, or arguments assumed to have fixed (often -- in this program -- used) values (Of course, this often introduces new invariants).
      3. Rescheduling instructions to run better pipelined, you would be amased at how good compilers are at this -- one technique i particularly like is using tree-automatons to select best-fits.
      4. Instruction selection -- using the right instructions for the processor.
      5. Inlining code, which makes more finding invariants easier, since some arguments may be known to have certain values (actually, inlining is a kind of partial evaluation)
      6. Moving the code to Single-Static-Assignment form (SSA-form) at some intermediate parse of the compiler, where every variable is only assigned once (this makes a lot of data-flow analyses work _very_ well)
      None of these (except maybe instruction selection) are even remotely feasible to do by human hand, and a most compilers only implement them poorly. Unfortunatly we do not have linkers who can select from a variety of instances of functions that have been partially evaluated to meet some (varying for each instance) input specification, this would increase code size by something like 10-fold, but only if all the partially evaluated functions were used.
  262. C is not for pansies by TheDullBlade · · Score: 2

    C is for efficient coding by trained experts, with the potential for portability if that is a desired goal. It provides no hand-holding that would interfere with run-time efficiency. The default behavior is and should be the most efficient behavior.

    There are plenty of other languages for people who are willing to sacrifice efficiency for more hand-holding.

    --
    /.
  263. until it plateaus out... by Anonymous Coward · · Score: 0
    The jitc would stop aggressively optimizing after the efficiency starts to plateau out, y/n?

    After all, there's some limit to how efficiently the code can be run on the given architecture (I would assume) and so once the compiler gets close to that limit, I'm guessing it would reach a point where it goes "Ok, I'll get a bigger jump in efficiency by just using this 57th generation piece of code, than by trying to optimize it one more time around"...?

    BTW, now that we're into tiny chip processes and superscalar and all that great crap, where's my 1024 RISC32 machines on a chip??? hmm? I want my connection machine!!!

  264. Why Optimize to overkill? by Anonymous Coward · · Score: 0

    Not that I don't appreciate optimized compilers and such, but compiler optimization hits the wall of diminishing returns pretty quickly. Especially when you consider that extreme optimization by a compiler in most cases will get you maybe 5% better performance in comparison to just a little optimization.

    When you compare this to the rate at which hardware is improving and it is trivial.


  265. GC by joss · · Score: 2

    > it's particularly nice in large and sophisticated object models where you almost always lose track of who has a reference to what

    Does this make anyone else nervous as heck. If you can't even keep track of when your objects should be deleted, how the hell do you keep track of anything. Increasing complexity requires more clarity of thought and design, not less. It's not the overhead of GC that bothers me, it's the sloppiness of thought it seems to encourage (try using JBuilder with less than 256Meg if you want to see what I mean).

    In some languages (eg ML), you really are working at a different level of abstraction and then GC makes perfect sense, but it's not GC explicity that makes life easier.


    --
    http://rareformnewmedia.com/