New Intermediate Language Proposed
WillOutPower writes "Sun is inviting Cray (of supercomputer fame) and IBM (needs no introduction...) to join and create a new intermediate run-time language for high-performance computing. Java's bytecode, Java Grande, and Microsoft's IL language for the Common Language Runtime, it seems a natural progression. I wonder if the format will be in XML? Does this mean ubiquitous grid computing? Maybe now I won't have to write my neural network in C for performance :-)"
I recall a system based on USCD Pascal. You would
write an interpreter on your target hardware that
would run the pascal p-code. It was supposed to
solve all sorts of problems. Except it was slow.
Nobody would write anything for it, I guess
because they didn't like Pascal, or USCD didn't
fire anybodies imagination with the product.
I don't see why we need to go through this again.
If you need performance write it in assembler or
use nicely optimized C. If you don't then an
interpreted scripting language will usually
suffice. What's the benefit to yet another
layer of abstraction?
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
Mountain View, Calif. - Sun Microsystems is inviting competitors IBM Corp. and Cray Inc. to collaborate on defining a new computer language it claims could bolster performance and productivity for scientific and technical computing. The effort is part of a government-sponsored program under which the three companies are competing to design a petascale-class computer by 2010.
Sun's goal is to apply its expertise in Java to defining an architecture-independent, low-level software standard - like Java bytecodes - that a language could present to any computer's run-time environment. Sun wants the so-called Portable Intermediate Language and Run-Time Environment to become an open industry standard.
The low-level software would have some support for existing computer languages. But users would gain maximum benefit when they generated the low-level code based on the new technical computing language Sun has asked IBM and Cray to help define.
Whether IBM and Cray will agree to collaborate on the effort is unclear. Both companies have their own software plans that include developing new languages and operating systems as part of their competing work on the High Productivity Computing Systems (HPCS) project under the Defense Advanced Research Projects Agency (Darpa).
"We think languages are one area where the three of us should cooperate, not compete," said Jim Mitchell, who took on leadership of Sun's HPCS effort in August.
Last week Sun proposed to IBM's HPCS researchers they pool separate efforts on such a software language, an idea Sun said Darpa officials back. Sun also plans to invite Cray into the effort. Representatives from IBM and Cray were not available at press time.
The language could be used not just for the petascale systems in the project, but for a broader class of scientific and technical computers.
"Java has made it easy to program using a small number of threads. But in this [technical computing] world you have to handle thousands or hundreds of thousands of threads. We need the right language constructs to do that," Mitchell said.
- Kaos games and encryption systems developer
Did I see XML and performance in the same sentence ?! ... brain overload.. does not make sense...
The effort is part of a government-sponsored program under which the three companies are competing to design a petascale-class computer by 2010.
will sun survive until then?
My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
Sun should have invited us GCC developers also to help out with this because most of us want a way to do Inter modular optimizations but we have the FSF looking over our shoulder on how we implement it, right now (the mainline) you have to compile all the source files at the same time to get IMA to work correctly and you have to say to produce an .o file first.
I really hope the author's smiley was to indicate that he understood that his string of buzzwords was meaningless.
What I hope is that Sun takes a good, long look at the only intermediate assembly that has been designed with language neutrality in mind, Parrot. While this article is over 2 years old, it's a decent starting point. Parrot has already been used to implement rudimentary versions of Perl 5, Perl 6, Python, Java, Scheme and a number of other languages. The proof of concept is done, and Sun could start with a wonderfully advanced next generation byte code language if they can avoid dismissing Parrot as, "a Perl thing" with their usual distain for things "not of Sun".... IBM on the other hand is usally more open to good ideas.
No wonder we have to keep making faster CPUs just to maintain the same performance. Is Java on a PIII or G4 any faster than hand-optimized assembly code on a 486 or 68030?
Soon we'll need a 10 GHz CPU just to be able to boot tomorrow's OS in less than 5 minutes.
That format could be extended into a vendor-neutral format for both interpretation, just-in-time compilation, and batch compilation.
The article is very light on details.
Huh?
So, how many languages are being proposed here? A new "low-level" one, plus a higher-level "technical computing language" designed to make the most of the lower-level one? Just what's so special about this new low-level language that requires a specific new language to get the "maximum benefit" out of it? I don't have to write in Java to be able to compile to the JVM bytecode. For that matter, I could write in Java and compile to some other assembly language.
New back-ends ("low-level languages," if I understand the article) are added to GCC all the time. We never needed to add a whole 'nother front-end just for them.
I suspect that the real situation is less weird, and the journalist got confused... or heck, who knows, maybe they're proposing half a dozen new languages. It's Sun, after all.
Odd. I wouldn't have thought you'd need to do that these days anyway.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
"There is no problem in computer science that cannot be solved by adding another layer of indirection."
I don't think they're trying to create a language for "high-performance computing" but a language for a "high-performance multi-processor computer", since they're focusing on threads and sun isn't a very good example (jvm) of high-performance.
In my opinion I would like a C language variation that let me specify how many bits i would like to use for a variable, because it would save a lot time because of memory bandwidth (cache space included) and is very boring to make a good implementation of that in assembly.
So, one of the ideas behind C# was to make an intermediate laguage (MS-Java-byte-code, if you will) which could be quickly compiled for the CPU in question. Stick a system call envrionment and garbage collector around it and you have [roughly] what C# is. One of the nice things about Java was that it was for no specific machine... it was very very simple at the instruction level, but making native code from that can be a pain.
Now, from the looks of the posted article some folks now want an intermediate laguage that can represent concepts like instruction vectorization and maybe SMP (hypter threading) and perhaps some other more complicated constructs that Java's machine code just doesn't talk about.
The end result is that you would have very fast machine code for the number-crunching loops in the code and portability. The compile time would be fairly quick and the optimization for the local CPU would be "smart" and fast if you marked up what where vectorizable instructions.
Why C# falls short, I can't say. I've only looked at the Java machine, never at how C# represets a program.
Hope this is helpful!
Sam
Architectural Neutral Distribution Format has been around for years and solves many of the same problems (and more).
I guess it is one more time around the (reinvention) wheel for sun.
Quite insighful... but it isn't as bad as it looks.
1) Nobody forces you to write in Java for PIII. Write hand-optimised asm sniplets for PIII and include them in bigger Java or C app for time-critical pieces. You get real PIII performance.
2) The software quality drops, but slower than CPU speed rises. That means your Java app for PIII will still work -slightly- faster than hand-coded ASM for 486.
3) Development cost. You can spend a week to write a really fast small piece of code in ASM. Or you can spend that week on writing quite a big, though slow app.
Most visible in games. Things like Morrowind, where crossing the map without stop takes a hour or more, and exploring all the corners is months of play, were plainly impossible when it all required hand-coding. Now for a developer it takes shorter to create a mountain in game than for a player to climb it. Of course the player needs better CPU to be able to display the mountain which wasn't hand-optimised, just created in an artificial high-level language defining the game world, but if you're going to enjoy the experience - why not?
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Because a well-designed intermediate language will help optimization. Being somewhat higher-level than raw machine code, not yet having to worry about the specific details of registers and pipelining, makes it easier to perform higher-level optimizations because the IL can be more easily analyzed. And when you compile from IL to the target you will have just the same opportunities for platform-specific optimizations as if you had compiled straight from the source language.
The other benefits of using an IL are manifold. New languages can be implemented without having to write a compiler for each platform. New architectures can be supported without having to write compilers for each language.
I have a positive modifier on Troll. When I mod someone Troll their karma should go UP!
All good compilers use at least one intermediate language. It's practically impossible to do good optimizations otherwise, even on a single platform. For example, you want to inline functions if that would improve performance, but in order to determine whether it improves performance means that you need to look at things like register allocation, which depends on things like the machine code implementation of complex expressions; however, inlining a function needs to be done with the higher level information about flow control and the structure of the function call. So you basically can't do any of the interesting optimizations without a good intermediate language.
Furthermore, getting from the high-level langauge to the intermediate language is cross-platform, which means that any optimizations done at this level are then available to all of the code generators for different platforms; this code is reused across back-ends. It also means that you can support multiple front-ends with the same back-end, and make your C++ and Java automatically compatible by virtue of sharing an intermediate language, and they also both benefit from the same architecture-specific back-end.
There's no reason that having an intermediate language means that you'll stop compiling at that level and use an interpreter for the intermediate language to run the program. In fact, gcc always compiles its intermediate language into machine code, and it can compile Java bytecode into machine code as well. Modern JVMs compile the bytecode into native machine code when the tradeoff seems to be favorable, and they can do optimizations at this point that a C compiler can't do (such as inlining the function that a function pointer usually points to).
An intermediate language essentially pushes more of the skill into the optimizing compiler, because the same optimizing compiler can be used for more tasks. Also, if the compiler is used at runtime, it can optimze based on profiling the actual workload on the actual hardware. This is especially important if, for example, IBM decides to distribute a single set of binaries which should run optimally on all of their hardware; you run the optimizer with the best possible information.
Because that doesn't give you best performance. Machine code represents an exact processor implementation. Tradeoffs have to be made with backwards compatibility (eg Redhat is compiled for Pentium), expected cache sizes (optimising size vs performance), processor specifcs (Itanium has 4 instructions per bundle, Sparc has one instruction after branch) etc.
While it is true that you could compile for an exact machine, it is a horrible way of trying to ship stuff to other people, and it does require recompilation if anything changes. (The former is why Redhat pretty much picks base Pentium - if they didn't they would need 5 or so variants of each package just in the Intel/AMD space. Granted they do supply a few variants of some packages, but not everything, and Gentoo people can confirm that doing everything does help).
Using IL lets the system optimise for the exact system you are running at the point of invocation. It can even make choices not available at compile time. For example if memory is under pressure it can optimise for space rather than performance.
It also allows for way more aggressive optimisation based on what the program actually does. While whole program optimisation is becoming available now (generally implemented by considering all source as one unit at link time), that still doesn't address libraries. At runtime bits of the standard libraries (eg UI, networking) can be more optimally integrated the running program.
Machine code also holds back improvements. For example they could have made an x86 processor with double the number of registers years ago. If programs were using IL, a small change in the OS kernel and suddenly everything is running faster.
Needless to say, using IL aggresively is not new. To see it taken to the logical conclusion, look into the AS/400 (or whatever letter of the alphabet IBM calls it this week). I highly recommend Inside the AS/400 by Frank Soltis.
For its time, UCSD Pascal was an excellent language and operating system. Its main problems were price and politics, not performance or technical issues. Many people, including myself, wrote software for it. The speed penalty of the p-code interpreter was offset by the compactness of p-code, which was important on the memory-constrained PCs of the time. UCSD Pascal, like other alternative operating systems of the period, could not compete with MS-DOS and PC-DOS, which sold for well under $100, on price.
Mea navis aericumbens anguillis abundat
Ah, come on, that is anything but not insightfull ... ... and so on for Smalltalk, Python, LISP ... ... which is wasting 80% of its die (and 80% of its resources, energy put inot its production, waste produced and released into the environment) to be "compatible" with some old 8086 invention.
What's wrong with making a good compiler that writes directly to machine code?
a) it wont run on my phone, because no one will port teh compiler
b) it wont run on my new internet enabled microwave, because no one want to port the compiler
c) it wont run on my cars electronic, as no one want to port teh compiler
d) it wont run on the next ESA space probe, the Venus Express, because no one want to port the compiler
and so on.
Whats wrong with having an ultimative VM designed and freeing all software developers from all porting issues for one and for ever?
Whats wrong with having an ultimative VM designed and freeing all hardware developers to be braked out by compatibility issues?
Come one, code geeks. Make a step into the future!! A 4 GHz Pentium is about 16 million times faster than my Apple ][ which I used 15 years ago. Why should I be burdened with coding habits over 20 years old? I dont want to write 10 to 100 lines of assembelr a day, because it expresses far less in terms of instructions than 10 to 100 lines of C. And I dont want to write 10 to 100 line sof C a day becaue it expresses far less in terms of instructions than 10 to 100 lines o C++
We need more different higher level languages and more VMs, as it is easyer to make a new VM than a new processor. We do not need more compilers for the same old languages just because one built a new processor somewhere
angel'o'sphere
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Maintaining high performance code across cpu achitectures is bad enough (and I know of some supercomputing centers which are continuing with technically inferior AMD64/Xeon clusters rather than switch to PPC970 precisely because they know they can't afford to re-optimize for that arch).
Factor in that today most numerically intensive code is still written in FORTRAN because competing languages simply can't be as easily optimized.
Now let's think about SMP, while POSIX threads are portable, the best performace probably requires different threading code depending on arch/unix varriant. (And of course NPTL for linux is still in CVS.)
Now let's think about massively parallel, where inter-cpu communication will be handled a bit differently on every platform.
So the payoffs to developing an efficient cross-platform language layer are pretty substantial. (Which does not imply that I expect IBM to jump on to Sun's bandwagon on this :-))
Linux is Linux, if One need clarify their dist: <Dist>/GNU Linux
bsds are of course just BSD
The effort is part of a government-sponsored program under which the three companies are competing to design a petascale-class computer by 2010.
We already have such a runtime: it's called "CLR". The CLR is roughly like the JVM but with features required for high performance computing added (foremost, value classes).
Sun wants the so-called Portable Intermediate Language and Run-Time Environment to become an open industry standard.
I hope people won't fall for that again. Sun promised that Java would be an "open industry standard", but they withdrew from two open standards institutions and then turned Java over to a privately run consortium, with specifications only available under restrictive licenses.
Sun's goal is to apply its expertise in Java to defining an architecture-independent, low-level software standard - like Java bytecodes - that a language could present to any computer's run-time environment.
Sun's "expertise" in this area is not a recommendation: the JVM has a number of serious design problems (e.g., conformant array types, arithmetic requirements, lack of multidimensional arrays) that attest to Sun's lack of expertise and competence in this area.
What this amounts to is Sun conceding that Java simply isn't suitable as a high-performance numerical platform and that it will never get fixed (another broken promise from Sun). But because the CLR actually has many of the features needed for a high-performance numerical platform, Sun is worried about their marketshare.
The question for potential users is: why wait until 2010 when the CLR is already here? And why trust Sun after they have disappointed the community so thoroughly, both in terms of broken promises on an open Java standard and in terms of technology?
Maybe we will be using a portable high-performance runtime other than the CLR by 2010, but I sure hope Sun will have nothing to do with it. (In fact, I think there is a good chance Sun won't even be around then anymore.)
All good compilers already use well-designed intermediate languages. A general intermediate language that aims to be equally suitable for many high level languages will most likely be inferior to the best intermediate language for a particular high level language.
The other benefits of using an IL are manifold. New languages can be implemented without having to write a compiler for each platform.
Great. Just what we need - another of those braindead technological "advances" like human-readable data interchange formats that makes life easier for a few developers (simpler, cheaper compiler development) and harder for millions of users (worse performance). Frankly, the only advantage for the rest of us I can think of would be the higher probability of the resulting tools being mostly bug-free.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
Ok, so now that Java is on the retreat they try to enter a new area?
It's probably because there's no Java user community or usefull implementations out there. And it has virtually no practical application on the desktop for that matter. Maybe because it doesn't do 3D or sound. Or is not so usefull as far as scalable RDBMS abstraction or a real application server for the enterprise. Maybe they need to move into the mobile market. What's really needed is a good Java IDE to get developers on board. Changes should be driven by the software community and making the source open would help as well. Sun should also be making improvments in Java's next(?) version.
You're right, I guess "we" should just cut our losses.
why run from Vincenzo?
Java is on the retreat??? Wow, I've been gainfully employed as a Java architect for the past five years; it musta' been a fluke. IBM, Oracle, Novell, et al must not know what their doing by investing millions in building their products around the Java platform. Come to think of it, there are sooo many alternatives to Java for enterprise, server-side computing. Thank you for your insight. I'll turn in my resignation and pick up a .Net book tomorrow.
Xenon, where's my money? -Borno
In terms of compiler optimisation, the higher the language the better. Strict typing and a language that allows the compiler to infer more about the call tree should enable better global optimisation. Lower level languages suffer from the problem that the programmer is explictly describing how to do something, and not what it is trying to do; thus the compiler can just unroll loops and perform peephole optimisations.
If a language was sufficently high enough that you could describe to the compiler that you were implementing a recursive function (e.g. shell sort), the compiler should then be able to perform fold-unfold optimisation and convert the code into a more efficient tail iterative function. Fans of Haskell and similar languages might recognise this. Some C compilers will convert recursion to iteration where possible, but this is only in simple cases.
The fact is that today, even as C has reached maturity and as high level as it is, there are still some optimisations that are impossible because of subtleties of the language. For example, multiple pointers may point to the same memory, but depending on how the pointers are assigned, the compiler has no idea that this is the case, and has to follow the code in a literal fashion.
My personal view is that languages like Java still have a lot to offer. I would like to see a lot more investment in the compiler to perform better optimisations, and would also like to see a compile on install system for Java like C#; if I run an applcation it would atleast be nice if the compiled parts were cached somewhere. This I believe could make good performance gains, and it's interesting that Sun's Server Hotspot VM actually performs more optimisation when compiling a class than the Client VM, however, because of the increase in time taken to load and compile a class, the Client VM omits some optimisation techniques to favour speedier loading. I guess this descision is to make GUI's more responsive and reduce app load times; compile at install would remove this constraint. We should be going to higher level languages, not lower, and concentrate on getting to compiler correct.
-- Mike
Well considering Java's startup time removes it from all manner of applications, it's a bit of a strawman to argue that startup time doesn't matter.
*cough* *cough*
Bullshit
Bullshit
Bullshit
Bullshit
Bullshit
Bullshit
Bullshit
Please take your bullshit trolling elsewhere. There are those of us with work to do.
Javascript + Nintendo DSi = DSiCade
Imagine N high-level languages and M target platforms. A naive approach would wind up creating NxM separate compilers.
Intermediate languages (ILs) allow you to write N "front-ends" that compile the N high-level languages to the IL, and M "back-ends" that compile from the IL to the M target platforms. So rather than needing NxM compilers, you only need N+M.
Even more significant is the optimizer. Front-ends and back-ends are relatively straightforward, but optimizers are very hard to write well. In the naive approach, you need NxM optimizers. With an IL, you only need one. The front-end translates to IL; the optimizer transforms IL to better IL, and the back-end translates to native code.
In summary, to answer one of your questions:
Every optimizing compiler uses an IL anyway. These companies, I presume, are simply agreeing to use the same IL across their products (though I'm only guessing because the article is slashdotted).Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
You need Forth - possibly the only language where you make up the language as you go along.
Example of the Forth definition for the "make everything explode because the time or energy has run out" routine in a game I wrote years ago:
: kill_everything ( - )FWIW, "bursts" was a convenience word used to make it read better. Its definition was:
: bursts (thing_to_burst - )(The bits in brackets are stack diagram comments. The argument "thing_to_burst" is actually the address of the data structure representing the animated entity,)
By judicious use of the English language in choosing your names, you could write what people thought was pseudocode, and it compiled and ran :-)
Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.