Transmeta Code Morphing != Just In Time

Re:Profoundly counterintuitive? by redhog · 2000-01-27 00:36 · Score: 2

I will only comment on one of your point, while you are, in my opinion, right, and insightfull (You are insightfull even on this one, but not totally right):

When given an environment like Java which doesn't allow pointers and other antiquated memory hniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.

The point is that the average application has inflated enormeous over the years. Thanks to stupid programmers, abstraction and cool languages. You gave the key to this yourself "on all but the most simple of applications"...
Not that a good compiler cannot outperform even a good assembly hacker on, for instance, memmory allocation. And not that abstraction in itself must lead to inneficiency, but it makes it inherently easy to allocate/free memory for the programmer, who will then of course allocate/free more memmory.
--The knowledge that you are an idiot, is what distinguishes you from one.

--
--The knowledge that you are an idiot, is what distinguishes you from one.

Re:Evolve code. by Geotrash · 2000-01-27 00:42 · Score: 2

Other engineers are required to use the minimum resources to achieve the maximum requirements of the project design, shouldn't computer 'engineers' be able to acheive the same.

Apples and oranges. The cost of creation materials for say a building, demands that the least amount of them be used. Not so with bits and bytes. These days, they're almost free. I applaud the true craftsmen of the programming world for their principle and talents, but with all due respect, to me it's penny wise, pound foolish to waste time reinventing the wheel each time you have to write code. Did you dig your home's foundation with a trowel?

Re:Evolve code. by DQuinn · 2000-01-27 00:50 · Score: 4

Sorry... this post is probably meant to be deeper inside this thread.

Are we not all just dancing around the idea of a human compiler with a vast database of algorithms, intuition, look-ahead properties, and an abstracted idea of what it is exactly that the program is doing? IE. There is no compiler, to my knowledge, which writes highly effective parallel code. Why is that? Simply because no compiler can really understand the overall problem.

How long do you think it will be before the all-powerful AI compiler is written, with all of these abilities, and more, on a box that runs at a few GHz?

Or maybe i'm just being a little too science fictiony here :)

Cheers,
D

--
os.system("perl -e 'print \"My first Python Script.\"'")

Re:The "C is not portable" myth by arivanov · 2000-01-26 23:07 · Score: 2

Take a look at a few of the most widely ported software packages around. How many of these redefine the types in order to "know the size of their types"? Not many, I can tell you.

Quite a few. Amazingly these are usually those that have portability problems. PGP 5 is a good example.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/

Re:Compiled Once vs. Compiled multiple times by mangino · 2000-01-26 23:08 · Score: 2

What if you compiled the original code with performance analysis included, and then fed that performance analysis back into a traditional compiler to execute again. In this way, you get all of the benefits of the above JIT system without the running overhead of the JIT. Sure, it takes a few thousand runs to get the program optimized, and you can only optimize for the average case, but most programs that need to be heavily optimized will have a roughly similar executaion pattern every time.

I may be wrong, but you can do this with the SUNWspro workshop compilers. This technology would be nice to have in gcc as well.

Mike
--
Mike Mangino Consultant, Analysts International

--
Mike Mangino
mmangino@acm.org

Re:Profoundly counterintuitive? by Hard_Code · 2000-01-26 23:15 · Score: 2

I think you may be missing (or misunderstanding) the point a bit. Sure, you can furbish an example, or a language, in which a human can optimize the program /statically/ better than a compiler/computer can optimize a program /statically/. But the author makes the point that the most frequently executed parts of the program are very sparse, and that the cost of profiling the program dynamically can be recouped by doing intelligent dynamic optimizations. As hardware gets faster and more complex I think this will be more true. There will simply be no way for humans to statically optimize for such a non-deterministic runtime environment. For example, how many programmers explicitly specify parallelism in their programs? How can they be sure that the CPU at runtime doesn't know better than them how to optimize the program? I say let the CPU decide. There is going to come a point in which it is just not feasible for humans to even attempt to optimize for such a complex runtime environment, and that time is coming fast. The faster chips get, the smaller the more amortized the overhead of finding better optimizations at runtime becomes.

Jazilla.org - the Java Mozilla

--

It's 10 PM. Do you know if you're un-American?

Re:Compilers dont write better code than humans by Junks+Jerzey · 2000-01-26 23:15 · Score: 3

You're right, of course. No matter what some people have been trying to say for 40 years or more, you can almost always write better code than a compiler. This is especially true when you're dealing with something bigger than a a single function. The usual way of showing that a compiler is better than a human is by using one smallish C function as an example. That's a pointless example, because the benefit comes from analyzing that function in context and not on its own.

The point of diminishing returns comes into play quickly, however. For example, take the renderer for most any fast 3D game. If you went into the routine that passes triangles to the graphics card, a frequent hot spot, and added a pointless call to a supposedly expensive function, like sqrt, you're not going to notice an effect on frame rate. Doing a higher level optimization, like removing a single polygon from a model, is going to be more of a benefit than optimizing the polygon code, but it's still not going to be noticible. Sometimes you can get big benefits by using algorithms that look more complicated, ones you wouldn't want to approach in assembly, even though they use more code.

With complex programs, it's even conceivable for an interpreted language to out run a compiled one, because everything comes down to architecture and an understanding of the problem. This is hasn't been the case with Java, because Java is a fairly low level language (the more abstract a language is, the less win from compliation) and because Java has become entrenched in the "learn programming in 14 days" market of web designers turned programmers.

So, yes, an assembly programmer can outrun most any compiler. But does it matter? Almost never.

Re:Ooo, ooo, start on packing! by ucblockhead · 2000-01-26 23:18 · Score: 2

Would you also send your pointers across a network and expect them to work at the other end?

The idiots who wrote the code that caused me the pain would have...

This involves one of the internal debates I've had about C for years. C allows you to shoot yourself in the foot. That is all well and good as I (I hope) know what I'm doing. The trouble is that, with turnover in this industry what it is, a new foot might have arrived by the time the bullet hits. (Unfortunately, I've often been the new foot.)

--
The cake is a pie

Re:Match code frag. as opt. tech. (Was:Evolve code by slashdot-terminal · 2000-01-27 01:01 · Score: 2

1.Please give an example on the type of code you'd want to optimize, any reasonably simple (small fragment of )ecode that doesn't depend on advanced integer arithmetic should compile to something as fast or faster than you
would possibly write yourself... Please include example.c or .cpp, GCC input, GCC assembler output and your "better" ASM output.

I was talking about a simple brute force compare and if it dosn't work then move on type thing. I think that this type of thing will always work (just take a little while). Considering that most modern computers are extremely powerful then I have to say that such brute force is still possible. What I don't get and still don't get is how people get all the time to actually research all this stuff. Does anyone have a functional life anymore? I am took courses for an associate of science and they didn't get anywhere as complex as all the little theory arguments as I have seen here.

1.Invartiant detection, using compilers to find invariants in the code, and then deduce lemma's which may remove bounds-checks, type-lookups, casting or flush of registers to memory.

Could someone please explain what a lemma is I'm sorry I guess I didn't take Phd level math courses.

--
Slashdot social engineering at it's finest

Evolve code. by Anonymous Coward · 2000-01-26 21:28 · Score: 3

"In the general case", perhaps a compiler will produce better code (and certainlty not, as yet, for specialised cases - an inner loop coded direct to PPC macro assembler (PASM on Amiga PPC) is the fastest code I've ever written, but humans still write compilers. Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write, but it'll probably depend on some weird side effect to some obscure instruction, or the rtesosnant frequency of your ram bus, or something...

Cool article, all the same though. I'd love to see more of this sort of thing on Slashdot, like back in the old days.

Re:Evolve code. by gfxguy · 2000-01-27 01:29 · Score: 2

There's also lot's of valid reasons for code bloat. Well, maybe not real "bloat", which would imply lot's of frivalous code. What I mean is that there are lot's of valid reasons why programs are growing ever larger.
The use of libraries, even for a small number of functions, increases code size (thus the 20k "Hello World" program). However, if we proceed to enhance the "Hello World" program to do a lot more, including input and output and simple processing - the code does not grow proportionately. In other words, a program 100 times bigger and much more useful than "Hello World" may only take up twice as much space.
Code size increases today are mainly due to interface issues. Very little interface code in Word Perfect 5.0 for DOS compared to Word Perfect 2000. It's a necessary part of the modern program.
I just don't think it's crappy compilers, or necessarily crappy code writing, that's to blame. It's feature bloat. And like anything else, as it becomes larger it becomes harder to steer.
The fact is, though, wether we need the features or not, programs today do a lot more than they used to. Word Processors give you many more format options, fonts, merge features, etc., then they ever did. Add the big GUI code (90% of most of my programs ends up being GUI stuff these days), and you get a modern "bloated" program.
I hate to say it, and I write command line utilities as much as possible, but when your target audience expects a visual interface, it adds a LOT of overhead to what might otherwise be a simple program. How big is "Hello World" compiled using Xtoolkit or Motif?!? Not to mention the libraries that may need to be loaded at execution time, making the perceived size even larger.
I'm not excusing poor programming, and I certainly have a dislike of "Visual" programming tools for my own reasons, but code growth is a necessary evil.

----------

--
Stupid sexy Flanders.
Re:Evolve code. by Anonymous Coward · 2000-01-26 21:44 · Score: 3

For deeply pipelined, agressively superscalar architectures, I fail to believe that a human being could possibly solve the difficult scheduling and graph-theoretical problems that optimal instruction ordering requires better than a mechanistic compiler. Current architectures are simply too complex and subtle for a project of non-trivial size to be more efficiently coded by humans at such a low level.
Nevermind of course, that for real prejects, the trade off in maintainability, portability, and extensibility is almost not worth any demonstrable performance gain anyway.
Re:Evolve code. by Bryan+Andersen · 2000-01-26 21:57 · Score: 2

Nevermind of course, that for real prejects, the trade off in maintainability, portability, and extensibility is almost not worth any demonstrable performance gain anyway.
Some people just have no idea how true this is, and I've had to work with many like that.
Re:Evolve code. by Alex+Belits · 2000-01-27 08:38 · Score: 2

How about this one? Lots of people want to be "Operators" because they get to spend time with their family, and don't get paged at three in the morning or during their daughters 8th birthday party just to come in to work to fix a a bug they introduced because they were just stupid? We also have hobbies that don't include computers, or computer games, and usually involve more than 4 people.

And why should anyone other than you care? Your "hobbies" most likely contribute nothing to lives of others, yet the price, society pays for supporting overpaid drones becomes rather noticeable. Why should I care, how wide was someone's smile at his daughter's birthday party if that person was responsible for network (or phone, or power) outage?

--
Contrary to the popular belief, there indeed is no God.
Re:Evolve code. by Anonymous Coward · 2000-01-26 22:15 · Score: 3

Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write

Of course, you'd have to put up with the fact that this would not be executable in Kansas.
Re:Evolve code. by slashdot-terminal · 2000-01-26 22:46 · Score: 2

"In the general case", perhaps a compiler will produce better code (and certainlty not, as yet, for specialised cases - an inner loop coded direct to PPC macro assembler (PASM on Amiga PPC) is the fastest code I've ever written, but
humans still write compilers. Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write, but it'll probably depend on some weird side effect to some obscure instruction,
or the rtesosnant frequency of your ram bus, or something...

Evolving code? What actual evidence that this is even within the reach of all but 2 people in the world. I just can't see how exactly this works. Ok I'll shut up about that for this post.

Generally I think that all that has to be done to make a compiler better or almost anything better is to increase the ammount of brute force checking. For example have a bunch of conditions that in effect say that if you have this and that instructions in C/C++ then you will have this output which can be created in assembly. Now this may increase the size of the compiler but it would allow faster compiler creation. Spending a large quantity of time on something that has almost hit a brick wall is usually never the way to go in this world. What a lot of people are finding out is that to get anywhere close to the level of education that is required is an incalculable frustration. Why do you think that everyone and their mother is going to those "schools" that teach people how to do network admining and such? It's because they want quick money in something that dosn't turn you into Ted Kazanski because of all the mental straining.

Most likely by the time I actually get the education level to handle this type of thing what I have studied will be obsolete anyway.

--
Slashdot social engineering at it's finest
Re:Evolve code. by slashdot-terminal · 2000-01-26 22:55 · Score: 2

It could also be said that humans can't solve the problem of writing a compiler that can handle the complexity of current architectures. Maybe we need to write a compiler capable of writing a compiler capable of writing a compiler...

If you think that something like the linux kernel or emacs is complex just try working on one of those little dears. What's wrong with just getting people to actually do the design work themselves. We created C++ and Java for the reasons of creating better code and creating maintainable programs. We must really look objectively and say is creating stuff for lazy people really worth the effort. Do you really want to be out of a job because the computer can just write that new app by itself? Do you really want to be in college for 20 years just to be bested by a machine that can supposedly do better than you? Guess what that leaves for any significant money:

1. business: generally not very much fun seeing other people make money, thinking about money, scamming the customer, and not too interesting for computer professionals.

2. Scientific research: Wow can you say compelex! and difficult to get into! Essentially you spend a great deal of time in college which means you have either to get a crappy job washing disher and scrubbing floors or take out thousands of dollars in student loans to get by. Not to mention the very sharp learning curve that it takes in doing these things.

Now in my case I do not handle stress very well at all. I hate spending large ammounts of time doing anything that is either hard or boring. I have historically not had a very high success rate in solving complex problems which leaves me feeling betrayed by the problem. It's little problems like this that are taxing enough on the human condition that are going to make the task of getting into computers next to impossible.

--
Slashdot social engineering at it's finest
Re:Evolve code. by silichrome · 2000-01-27 00:26 · Score: 2

I've read some of the post for this thread. I should say something P.C. like the processing
power of the moment increases every 11 months, but frankly its a cop-out. Let me explain
before you send the flame mail, true it is a fact that according to Moores law, that memory
(processor also) increases capacity by a factor of 2 every 11 months. But there is also
another law qouted be system 'engineers' , "the faster the hardware available, the slower the software that operates on the hardware". You can disagree with me, but its a simple fact
that major system software such as $MS Windows$ requires a more powerful processor, more memory and a bigger harddisk.
Increasingly developement of programs operate in the megabyte range. My question is why! , At one time ALL programs were limited to 64K segment, because that was the standard size for a ram chip. A "hello world" program compiled in C++(c evil progeny) can easily exceed 20k.
You probably say BIG DEAL, even coded up in Turbo Pascal the max is 2k. I dont think that I am eccentric because I don't want BLOATED CODE!. Getting back on task, code morphing is a polished version of self-modifing code. I know it isnt obvious on the surface, but think does it really matter if the code that is emitted by the compiler is part of the active process, the end result is code is executed, that wasn't initially envisioned by the program developer.
I know thgis post is a little off-topic and probably get moderated out, but striving toward
maximum efficiency should be the focus. Laziness and apathy shouldnt dictated the rule stick
. Other engineers are required to use the minimum resources to achieve the maximum requirements of the project design, shouldn't computer 'engineers' be able to acheive the same.

Java Byte Code by infodragon · 2000-01-26 21:28 · Score: 2

The Curso technology will probably make the Java chip a reality and improve upon the idea all in one shot. Program the morphing code to run Java Byte Code and you have a Java chip which optmises your code. A very clean language that you can easily maintain and not sacrifice execution speed. Every Software development Managers dream!

--
If at first you don't succeed, skydiving is not for you.

Re:Java Byte Code by jilles · 2000-01-27 01:18 · Score: 2

The only advantage C++ has is it's speed. There is no other advantage. There's lots of disadvantages though.

Java is easier to code, easier to maintain, platform independent. And that's only the language. C++ is going to be cobol's replacement. The next generation of programmers will make huge amounts of money keeping C++ code running. Don't get me wrong, C++ is a powerfull language in the right hands. Unfortunately it often ends up in the wrong hands: people who never really understood the OO paradigm, people who love to use multiple inheritance. people who think that macros actually improve the maintainability of their code, ... (all the horrible things you can do to make source code unreadable) ...

You are probably right that java is suitable for web applications but then, what isn't webrelated these days? Standalone programs are a dead end for most domains nowadays. The client server architectures and n-tier architectures in use nowadays make networking a requirement not an optional feature.

--

Jilles
Re:Java Byte Code by slashdot-terminal · 2000-01-27 01:40 · Score: 2

Java is easier to code, easier to maintain, platform independent. And that's only the language. C++ is going to be cobol's replacement. The next generation of programmers will make huge amounts of money keeping C++ code
running. Don't get me wrong, C++ is a powerfull language in the right hands. Unfortunately it often ends up in the wrong hands: people who never really understood the OO paradigm, people who love to use multiple inheritance.
people who think that macros actually improve the maintainability of their code, ... (all the horrible things you can do to make source code unreadable) ...

I have actually learned/am learing C++ on a collegiate level. I can say that they do not offer Java in the CS department. This is a reflection in the fact that Java is not important for what programmers actually do in the world. There is one class of Java taught in the whole college and that is in a business related sub-optimal computer program. This surely says something about what people really do. I would wager a whole keg of beer that you would find a large percentage of schools who do not use java and a larger portion of software development that dosn't use it. (Around 30-70%). As long as java remains so un interesting and tied to the web and the internet I will never learn it. Haven't you ever tried to access the internet and couldn't for whatever reason? Now look at the number of times you computer has prevented you from accessing the data you need. Now what should happen for most people is that the internet should be statistically more prone to failure than traditional media and far less controllable and customizable for your needs.

You are probably right that java is suitable for web applications but then, what isn't webrelated these days? Standalone programs are a dead end for most domains nowadays. The client server architectures and n-tier architectures in use
nowadays make networking a requirement not an optional feature.

This is quite bad for at least a few reasons:

1. access: If I need to have more silly requirements then if begins to decrease the ability of that application to actually get something accomplished in a timely manner. When I get a computer do I have to get say Phd in CS do use said computer (contrary to what people would like of the world)? No. This is because it's not necessary. Suppose I have a video game. Does say even dramcast or playstation games or even almost any PC game really *need* to be connected to a network? I can't really see any advantage if I am just a single lone person why that makes the experience any better for me. Almost anything you want control over does not *need* to have access to the web. Does working with say simple spread sheets need web access? How about word processing?

Also let's look at how the business model works in the internet world for services. Usually the trend has been that the individual has little or no control. This is reinforced with the huge cost of actually getting a dedicated server or bandwidth so that you can control it. Again for the average person (typical consumer who wishes to utilize the computer and has a moderate income of say $25,000-$70,000 USD/yr even if they are schooled in CS) this dosn't help at all in any way. Tell me why I shouldn't have access to all the things that I want to do at my finger tips without the need for someone to control it, charge me for it, change it, or make use of a series of networks which quite frankly are controlled by robber barons who don't care about the little guy?

--
Slashdot social engineering at it's finest
Re:Java Byte Code by jilles · 2000-01-27 02:32 · Score: 2

"This surely says something about what people really do.'

I think it says more about the college you are attending. If the business administration department (the place where they create those clueless marketing types) is the first to adopt Java in the college you are attending I have just one advice: get out of that conservative hole you are sitting.

"As long as java remains so un interesting and tied to the web and the internet I will never learn it."

If you don't see the point now you'll never see it. I give up. Happy hacking.

BTW. have you tried COBOL, I heard 90% of the code in use today is COBOL (just like you no source for my information though) must be a great language, fast too!

--

Jilles
Re:Java Byte Code by jilles · 2000-01-26 21:49 · Score: 3

Java bytecode interpretation is not the main cause for java's weak performance. The cause lies in mechanisms like garbage collection, memory allocation, thread synchronization etc. which are expensive. If you'd implement similar mechanisms (i.e. with all the built in safety) in C++ you'd have exactly the same problems (I don't think that's actually feasible though).

--

Jilles
Re:Java Byte Code by arivanov · 2000-01-26 21:56 · Score: 5

No it will not.

Think of Java licencing and control issues. Discussed widely on slashdot. Actually I shall retract this statement if one of the MAJOR league players will accept Crusoe as a primary CPU for at least one machine class.

Otherwise it is obvious that it could be thy Java machine, but it is least likely to be.

I think Cruose will actually make a reality something else which is much cooler than Java. It will make real thy developer's dream - the coat of many colors: The affordable machine of many architectures.

On the basis of Crusoe even now you can build a machine that can happily emulate:

Mac, Sun (lower end), IBM PPC (lower end), SGI (lower end), Alpha and curse it x86.

1. All these are PCI based.

2. The differences in chipsets can be ignored under the "one OS to rule them all, one os to find them, one os to bring them all and in (oh well cutting out darkness) bind them ". It can have drivers for the chipset and peripherals in question.

3. You may actually do the reverse thing and develop drivers for the peripherals and the chipset for all platforms in question (not a hell of an effort, actually quite achievable). If peripherals are something like adaptec, tulip and a PCI VGA the drivers are basically there already. So you have only the chipset left. Actually you can intercept these and emulate chipset behaviour if you so desire.

Overall - the developer's dream can become a reality for just a few bucks - about the price of a PC (excluding licencing for OSes and sowftare of course ;-)

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:Java Byte Code by slashdot-terminal · 2000-01-26 23:02 · Score: 2

The Curso technology will probably make the Java chip a reality and improve upon the idea all in one shot. Program the morphing code to run Java Byte Code and you have a Java chip which optmises your code. A very clean
language that you can easily maintain and not sacrifice execution speed. Every Software development Managers dream!

I think that C++ is more complex and has a little more maturity. As it stands now you really shouldn't use java for anything more than web related stuff. This is because of it's inherent use of networking and such. Why force the use of network connectivity when standalone programs are just fine. From a standpoint of looking at failure the more points of reliance that are utilized the worse the design. If I have to talk to 10 different people before I can get ahold of say my boss he has put a large degree of failure into his interpersonal communication system. This can be illustrated in events like the Challenger space shuttle disaster where people did not really take anything seriously because of ineffective communications.

--
Slashdot social engineering at it's finest

Re:GC by Mr+T · 2000-01-27 01:13 · Score: 2

It's not sloppiness so much as what C++ let's you get away with. You start building objects and then you have to communicate between them. Do you provide references to objects or do you provide copies of the objects and then update them all when one changes? Once you start leaking references, if they go far enough, problem errupt.

On large systems, master C++ programmers, master programmers have had these problems in real world applications. It's not so much not knowing when to delete objects so much as making sure you delete enough pieces when you delete them. If you delete an object that is referenced by another then you're going to have a crash. If you delete an object but not the references it contains then you have to have another reference to each of those objects and you have to know what you're down to the last reference so you can delete the object.

I agree about the sloppiness of thought associated with java and GC but that laziness and the willingness to be sloppy is there regardless of whether or not the programmers are using GC. TakeGC away and they may be a little bit more careful but they've already shown that they are willing to be that lazy and sloppy.

--
This is my signature. There are many signatures like it but this one is mine..

Re:java JIT itself is not that slow. by radish · 2000-01-26 23:23 · Score: 2

3. as WORA is important in java, it is hard for java to use any platform specific resources such as Graphic card's accelerations. that is
one of the reasons swing's design is getting ugly as it tries to boost its performance.

Surely it's the JVM's job to abstract hardware acceleration in the same way as, say, DirectX or OpenGL? For instance, my windows JVM might know about directX and so implement the primitive drawing functions within the swing classfiles using the accelerated calls. I don't know if that's what happens, but it strikes me as the logical plan.

--

---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

Re:Self-Modifying Code by ZoeSch · 2000-01-26 23:23 · Score: 4

Nevertheless I'd like to point out that JIT is based a somewhat dangerous technique: a program that alters (its own) code. I believe this technique was used in the eighties to scare off hackers by making code incomprehensible and hard to disassemble until the program was actually running. Also (even on a good old 6502 processor) it's possible to make some speed improvents peeking and poking into the code you're actually executing.

Not at all true... self modifying code is still around not only on JIT compilers but in interpreted languages and even device drivers (Or how do you think some device drivers adjust in real time to hardware changes?). And it's not a dangerous technique in itself, but (As usual) when executed improperly it can be a wild beast (As most people discovered when thy installed their first Cyrix and AMD 486 processors and discovered most x86 code wouldn't run properly because of cache integrity issues)

When compilers for Microcomputers got faster and most processor architectures (known as Harvard architecture, I believe) explicitly require a division of RW and RO memory segments, self-modifying code was abandoned...

First of all, Harvard machines require separate instruction and data pipelines and memory spaces, and 99% of the CPU's on the market (General, Embedded, etc.) are Von Neumann machines which use a single space to store instruction and data. The thing is, newer CPU's (Even if they're Von Neumann desings)have separate caches for data and instructions and most self modifying code violates the cache integrity (See above) because the code is modified in the cache but never stored back into memory (Unless you use a write-through cache design which is impractical from the performace point of view).

BUT (And that's a big BUT :) if your JIT VM forces periodic cache writes to RAM (Not every time but enough times to ensure some sort of coherence if the cache and RAM are out of sync) you lose a bit of performance but gain a stabler setup.

Disassemblers for JIT won't be as complex as a JIT assembler just because when you disassemble a piece of code you treat is under the black box principle (What goes in and what goes out) in order to derive the fundamental principles and algorithms, which can be implemented in 1E999 different ways, so if you disassembled the less optimal morph, bad luck... disassemble another test run and see if you get a better implementation of the algorithm. Or you could use the aforementioned principle and implement your own algorithm.

JIT code/compilers, BTW are also being tested to produce self modifying chips (ASIC's and FPGA's) under VHDL/Verilog using also NNets or Gen. Algorithms to obtaing a first implementation and then using the JIT to optimize it...

Really interesting stuff (The Transmeta Crusoe) and my bet is that soon other companies will follow altough not for the same reasons as Transmeta :)

ZoeSch

--
I hate to agree with davecrazy but...

Imperative languages are hard to reason by Chuan-kai+Lin · 2000-01-26 23:26 · Score: 3

I do agree that machines should spit better code than humans do, but we are just not there yet. When will we get there? Maybe tomorrow, maybe next month, maybe next year, maybe next century. Nobody knows. Suppose we do get there tomorrow, I believe it will not be Java + JIT that made it, for the following reasons:

Java is an imperative language, which are very hard to reason about because of the lack of referential transparency. Yes, I know that it is a known fact in computation theory that it is impossible to reason about arbitrary programs, but the fact is that we do not write arbitrary programs. Imperative programs with destructive updates are hard to reason about, and as a result we are almost always restricted to local optimizations. Local optimizations are great, but I am afraid that that alone will not be enough to break the barrier.
Vast amounts of information is lost once the source program is compiled into bytecode, therefore making effective optimization much harder. You can tell the intention of the programmer from the source code, but it is hard to do so for machine instructions (or even preprocessed source code, which anyone who had read the nVidia XFree86 source code must agree). The JIT compiler must create code that acts exactly the same as the bytecode, and that restriction severely reduced its options.

If that breakthrough were to come tomorrow, I believe it is most likely to come from the functional programming people. I believe that is the way to go for the future.

heap and stack by kmcardle · 2000-01-26 21:35 · Score: 2

I don't see where your argument about placing memory for new() on the stack holds up. The point of dynamic object creation is to allow the program to create an aribitrary number of objects and then free() them when done with them.

Trivial example that really makes the stack argument fall apart:
Let's say I have 500 objects new()ed. I'm assuming that the first object created would be on the bottom of the stack, and the last object created would be on the top. I now delete the first 499 items. I now have 499 * object size bytes of memory on my stack that I can't use becuase I still have the 500th item on the stack.
I can't recover the memory because it's on the stack.

Am I missing your argument? Help me out here.
--

--
then it comes to be that the soothing light at the end of your tunnel is just a freight train coming your way

Re:heap and stack by ucblockhead · 2000-01-26 22:41 · Score: 2

I think that what he is saying is that sometimes you could gain a lot of speed in java by allocating on the stack. This has been one of my biggest problems with Java, the way everything is done in the heap, not the stack. If an object is only used in one function, (or functions it calls) it makes perfect sense to put it on the stack, not the heap.

(I'd post an example, but I'm not a Java guy, and don't want to screw up my syntax in public.)

I used to do something like this in C, manually. Suppose you have this:

void Foo(int y)
{
char *Bar=malloc(y);

/*dostuff*/

free(Bar);
}

With some compilers, you can save time and avoid memory leaks by allocating on the stack like this:

void Foo(int y)
{
char *Bar=alloca(y); /* Compiles roughly to Bar=sp, sp+=y (or is it sp-=y?)*/

/*dostuff*/

/*No free needed, popped off of stack.*/
}

I'd think that if a Java optimizer were smart enough to know when to do this, it'd speed things up tremendously for some programs.

--
The cake is a pie

provide some examples then... by SEAL · 2000-01-27 03:46 · Score: 2

"garbage collection is slower than hand coded memory management" - You really ought to read some of the latest garbage collection papers. This is no longer true even for collectors managing C or C++ allocation. When given an environment like Java which doesn't allow pointers and other antiquated memory techniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.

Now, I have yet to see anyone provide concrete examples of a garbage collector performing better for the general case, than standard memory management. However, let's assume that the garbage collector provides a small performace gain overall.

Unfortunately, as others have pointed out, compilers don't understand the objective of the program they are dealing with the way assembly coders do. The same is true for a garbage collector.

I can speak from experience as a game programmer here. There are certain parts of a game which are time and performance critical. How can I trust a garbage collector not to slow things down in that section? Well, I can't. There has to be some form of programmer interaction telling it not to cleanup in section X of the code.

This isn't very practical for games though. Typically we want full control over the memory. We know the upper limit of what we need, and we don't care much about other applications. So we can allocate a large chunk at start time and then work within that, to avoid the new/delete/malloc/free overhead.

If removing "antiquated" things like pointers is the best solution, then we'd all be writing games in Java. OK well that's a stretch but you see my point...

Best regards,

SEAL

JIT by any other name by TheDullBlade · 2000-01-27 01:16 · Score: 3

"1...not a JIT system but a dynamically optimizing compiler" a slow start is a big problem (which is why they are emphasizing long-term server programs), but that's not the main issue. There's no reason that profile-based optimization can't be applied to C programs. That there's an experimental JIT (yes, it's still a JIT) that's ahead of the C compilers in common use is no reason to claim that the JIT strategy is better.

"2...as long as the results of the operation don't change" Which means that if the representation is different in a meaningful way they have to check whether the results of the operation change, reducing the efficiency, which was my point.

"3...on all but the most simple applications" But I rarely do anything but the most simple applications. I allocate an array, then I free an array. I do this for hashes, trees, stacks, queues, and pretty much any data structure I use. Only when I am being extremely lazy will I allocate and free objects recursively. Modern CPUs with slow main memory and fast cache like it when you keep all the data together, so I generally do so. I also try to access it linearly, which they also like. The point holds.

Don't try to compare the most primitive and unoptimized methods of custom memory management to the most sophisticated garbage collection algorithms. Remember, too, that a person doing custom memory management can write the same sophisticated garbage collector that Java would use.

And don't confuse "primitive" with "fundamental". Pointers are fundamental. Having to analyse reverse polish code function to prevent dangerous operations, rather than making it impossible to construct operations you don't want, is primitive.

"4...then tell me that dynamic dispatch is a problem." Okay, dynamic dispatch is a problem. Just because it can inline methods here and there doesn't mean you won't take an overall performance hit. Besides, that was only one example of the ways the lousy Java handholding slows things down.

The Sun marketing papers you linked me to are just more of the same old Java hype. They'll go on to "prove" that their new super JIT is better than anything you could code by hand by comparing optimized idiomatic Java with the same code translated poorly into unidiomatic C++.

Idiomatic C written by a competent optimizer will still blow the Java version out of the water.

JIT by any other name smells just as bad.

--
/.

Very interesting, but not why I pick Java by evilpenguin · 2000-01-27 04:10 · Score: 2

This is very intersting stuff, and I do care about these questions, but virtually none of this enters into my selection of a language for a task.

When you are writing a routine that reads single keystrokes and puts them in a buffer, you really don't care much how optomized the loop is unless it takes more time than elapses between the user's keystrokes.

In other words, I'm very glad people are working on smarter compilers and Java optomizations, but I select Java for one reason alone:

It is easier to write more code more quickly with fewer bugs in Java than in C or C++.

Most of us in MIS find ourselves in situations where we can throw more hardware at things much more cheaply than we can pay people to write and debug code.

Certainly there are situations where we are prcoessor-bound, or where data comes from a high volume network source and then we must consider performance, but since in many cases even a slow language is "fast enough" for the maximum throughput, the ease and reliability of development come to the fore.

So, if I want a fairly portable device driver, I write in C. If I want a stable application, I choose Java. For me it is this feature of the language, not its speed (or lack), not its "write once, run anywhere," but rather the ease with which stable code may be written that draws me to Java. After that, I am concerned more with the footprint of the VM and its impact on non-Java system components than I am with the optomization of Java byte codes.

I say this not to disparage this discussion, which I really do find enormously interesting, but I must say that for many of us, this is the computing equivalent of counting angels on pinheads.

Turf war - morphing vs. JIT by homunq · 2000-01-27 04:17 · Score: 2

I'm surprised nobody else has commented on the obvious here. Sure, maybe dynamically recompiling is awesome. But doing it twice at once isn't. Just imagine your Crusoe chip screaming in agony as the code-morphing tries to profile and recompile code that is being dynamically recompiled by the JIT compiler. And how do you fix that?

By abstracting your two smarty-pants recompilers into a "hardware" and an "interpreter" layer, you've removed any way for them to talk to each other or even really be aware of the other's existence. The problem's not theoretically impossible, but the practical advantages of keeping your abstractions clean make me worry about the hopes for Crusoe web-pads (which would presumably run a lot of Java).

--
Preferential Voting: easy as 1-2-3

OT: computers at blitz by Daniel · 2000-01-27 04:26 · Score: 2

Actually, computers tend to be even better at speed chess than at regular chess. The reason is that computers analyze games by expanding the game tree as far as possible. Because the chess game tree grows exponentially, this leads to diminishing returns as the amount of time increases; if you watch the "ply searched" indicator on a chess program it'll generally jump to somewhere between 3-5 moves almost immediately and then appear to stall. However, even this shallow game tree is enough to catch an unsuspecting human in a tactical trap.

The reason humans can still beat computers is that they can apply long-range strategic planning -- "what are the best ideas to pursue in this position?" -- and rules of thumb or experience about what sorts of things are likely to happen in a certain type of position. Working from this, they then decide on a small set of moves to evaluate, usually in a narrow tree (because humans can generally discard moves which are irrelevant out of hand; I believe some better computer algorithms try to do this as well but I'm not sure) This approach doesn't really start to pay off until the human has had some time to think, but once it does it can allow the human to catch up to the computer, which is still examining every possible outcome of shuffling the rooks back and forth.

Disclaimer: all of the above is empirical guessing and may not bear any resemblence to reality.
Daniel

--
Hurry up and jump on the individualist bandwagon!

Re:OT: computers at blitz by Daniel · 2000-01-27 08:10 · Score: 2

I haven't heard that, but I have heard of extremely similar experiments, where weak players and strong players were given a similar challenge, but with both positions from real games and randomly generated positions. The novices did equally well (or poorly) on both types of positions, while the strong players performed significantly better on the 'real' positions than on the random ones.
I don't remember where this information comes from, it was probably some chess book or other :)

Daniel

--
Hurry up and jump on the individualist bandwagon!

Re:Compilers dont write better code than humans by petermcanulty · 2000-01-27 01:27 · Score: 2

This is a personal anecdote where I would say that the compiler generated better code than I would have.

I worked on an embedded network analyzer product using the intel gcc compiler for the i960. It contained about 16000 lines of C code (not trivial, but not large) that included a bare bones operating environment, network drivers, and the actual analysis code. Of these ~16000 lines, about 500 of them were in the "common critical path" and would execute on every incomming packet.

The intel gcc compiler for the i960 has this is really cool feature called "profile based optimization". You build an instrumented version of the executable, run a set of test data through it, get the profile data, and re-compile with the profile data. This optimization feature had the added advantage of closely integrating both the compiler and the linker.

The main benefits of this optimization included:

Setting the branch prediction bits for the most common cases.

Inlining functions with high hit counts : both within a module (traditional inlining), and across many modules (an advantage of linker integration)

Placing variables with high hit counts in faster memory (the i960 had 1k of on board, single-clock DRAM, and our card had both 2-clock SRAM and 4-clock DRAM)

Inlining all functions that get called only once in the executable.

I saw a 150% performance improvement over the non-profile based, but otherwise fully optimized builds.

The key 500 lines were interspaced through about six functions in four modules, often in functions where half the function was error/unusual path handling. Although I could have rewritten those 500 lines in asm, it would have been very hard to keep it integrated with the rest of the code. Also when I examined the instruction trace of those 500 lines, I found only two or three places where I would want to change things.

Intel's x86 compiler is supposed to have similar a set of profiling optimization features and another feature that's even more important on the x86 that lays out functions and data in the executable to reduce cache misses and page faults based on the profiling data.

In the "me too" catagory: Optimizing for Alpha processors is a nightmare best left to the compiler. Just try to single-instruction-step through some Alpha code and see how much you move around within the C source.

Peter (first /. post!)

Re:Not so obvious by jilles · 2000-01-27 04:43 · Score: 2

I had a commodore 64 once. Never bothered to learn assembler code for it but even today I'm amazed by what people did on those machines. But it just doesn't scale up to a modern PC.

Perhaps those people from Intel can write code for trivial software but anything that goes beyond that is even beyond the minds of those people. Of course those guys at Intel can program their own CPU's manually (at least I hope so). Hand coding millions lines of assembly code is something else though, let alone optimizing it. The problem doesn't scale linearly you need a computer to optimize that.

Either way a static compiler nor a human being can compete when you start to involve real time profiling information. It gives the dynamic compiler an edge, it can optimize to situations only known at run time.

So, I hope you see my point. I admire people who can handcode a more optimal program than a compiler but they can't compete with dynamic copilers such as transmeta's code morphing or SUN's Hotspot because they simply don't have the kind of information those compilers have.

--

Jilles

Re:Compilers dont write better code than humans by coreman · 2000-01-27 01:29 · Score: 2

There is certainly a range of people that write code and I've seen both extremes. Poorly written higher level language programs aren't going to be improved by a compiler. There might be suggestions by a good level syntax checker but that isn't the topic here. One of the long term comments I've always made is that you can write Fortran in any language. If you don't understand the language constructs sufficiently, you aren't going to be saved by the compiler. There is no substitute for programming competence.

Re:No. Hand coded assembler is faster. by GnrcMan · 2000-01-27 04:53 · Score: 2

This post clarifies what I was saying. To summarize: 1. By "modern processor" I mean a processor that allows a compiler to perform good optimizations. The X86 instruction set is not very good for this. I don't believe the 486 even performed any parallel execution. Specifically, I have experience with the Alpha. 2. You are talking about a specific algorithm. As I said, 99% of the time the compiler will beat you. I didn't say assembly was dead or not useful sometimes. Just that most of the time the compiler will generate faster code overall than a human. 3. Facinatingly enough there are cases where your assembly optimization will actually hinder the compiler from doing a better optimization. Specifically, some paralellizing compilers I think.

Check out this link for a plethora of information on compiler optimizations.

--GnrcMan--

Re:Not so obvious by jilles · 2000-01-27 01:32 · Score: 2

Your programmer would have to be really good to beat a modern compiler on optimizing for very complex instruction. Also he probably would need a lot of time even to get very simple programs up and running that way.

And then there's still something a java VM (I'm assuming the hotspot type here) and that is dynamically (i.e. while the program is running) optimizing code using profiling information. This also works in situations where a program dynamically loads a class (possibly a class that did not exist at the point the other classes were developed). Your programmer cannot not take this situation into account when he is coding.

In other words, under normal circumstances the assembler superman would lose (badly).

Now, your next question will probably be why java is so slow then -> see my other reply's on this story.

--

Jilles

Re:Compilers dont write better code than humans by GnrcMan · 2000-01-27 05:06 · Score: 2

As a side note: I remember hearing FX!32 also keeping some information from the JIT compilation and storing it on disk for the next time the
application is run. Anyone knows whether this is still used and to what success?

Yes, FX!32 would actually profile the application and convert parts to native Alpha code, storing the native code in a database. Incidentally, the profiling happened during execution, while the optimization happened during free time, after the program was done being executed. The more you ran the program, and the more code paths you hit, the faster it would get.

It's a moot point now, though, since NT doesn't exist for the Alpha anymore.

--GnrcMan--

Re:Compilers dont write better code than humans by technos · 2000-01-27 01:34 · Score: 2

Massivly parallel creations are largely very close to what the programmer wrote, good compiler or no. Compilers for this case are especially weak, and only make vague and half correct assumptions.

--
.sig: Now legally binding!

short answer by TheDullBlade · 2000-01-27 01:39 · Score: 2

I'll simplify:

1)Anything that can be expressed in Java can be expressed in hand-coded assembly; the reverse is not true.

2)Humans are smarter than computers.

3)Humans can always use computers as aids to their work, and so produce output equivalent to what the computer can produce as a baseline worst case.

--
/.

just plain wrong by TheDullBlade · 2000-01-26 23:44 · Score: 2

On a machine with 64 bit ints, as on a machine with 32 bit ints, ints and longs are the same (assuming the compiler was competently designed; it could fit the standard without being). Using long should only ever confer a performance hit on 32-bit machines, for 32-bits and over, long is pretty much guaranteed to be the same as int (AFAIK, the standard doesn't force it, but it doesn't require any more range in the long, so it would be stupid to make it bigger).

If you want to define your own special int ranges, you can always use Ada. That's the kind of language you get when you throw in every nice little feature.

--
/.

Re:What about the JIT? by um...+Lucas · 2000-01-26 23:45 · Score: 4

I read that the 700 MHz Crusoe chip actually only attains the performance of a 500MHz Pentium (II or III, I forgot - oh, can Crusoe emulate SIMD? Not like it matters, but I'm wondering).

So there you have it - code morphing, or just in time compiliing or whatever gives you about 70% of the performance precompiled code.

That's not to downplay Transmeta's theoretical accomplishment. Those extra 200MHz go to things like monitoring it's VM, throttling the clock, etc. So in the end you lose 30% of the performance but make it up with 35X less power consumed.

It's a great trade off for laptops, handhelds, etc. But not for workstations, servers, etc...

I still don't like the idea that they're keeping the instruction sets closed. It would seem like if someone out there wanted to port GCC to Crusoes native instructions, that would be good... But they just don't want to be percieved as being at all incompatible with Intel, i guees.

Wait a sec. by aheitner · 2000-01-27 01:43 · Score: 2

I can also write code in ML that can be proven correct at compiletime, since it's statically typed and lexically scoped (Java can still come up with an exception and die on you).

So my question to you is: How are your languages so magic if none of the software I use on a daily basis uses them, or seems to miss their properties?

Some guys here at CMU wrote a TCP/IP stack in ML that was provably correct. It took them a whole year. Was it worth it? You could write a whole operating system in that time!

...

The stack-based ISA for Java is slow. It doesn't map well onto actual processors. It's a big reason that even JITC'd Java isn't very fast.

There's no reason not to use a language like Java with slightly stronger typing than C++ (although there are good reasons to use C++; there are powerful things C++ can do that Java can't). But it should be compiled, even if "compilation" is something that happens the first time you run a program, and it should be compiled from a more intelligent format than a nonexistent ISA. Why not distribute source code after the frontend has run on it, with all the symbols resolved and the syntax broken up into lower-level constructs? That would be as hard to reverse-engineer as assembly, and significantly moreso than Java classes (which leave way too much info in there).

...

Being able to state with supreme confidence that this type of bug does not exist in my code because it has been rigorously proven is helpful. You really should be more concerned with running correct code than running "fast" code.

Except for the fact that we've gotten everything done so far by writing fast code, and testing it to see that it's reasonably bug free. Proving code with Java is still quite hard, and writing code at all in ML is very hard.

Empirically, companies will trade off getting the software written for having it proven; they've been doing things that way for quite a long time. Java certainly won't change that.

Re:No. Hand coded assembler is faster. by GnrcMan · 2000-01-26 23:49 · Score: 2

No. Hand coded assembler, coded by a clueful programmer, will always be better than compiler generated code.

You are wrong.

With modern processors, a decent compiler can do better general optimizations than you or any other assembly programmer I've ever heard of. While there is that (less than) 1% of the time when you can hand optimize a small piece of code, that's the exception rather than the rule. The rest of the time you are better off letting the compiler do it's job.

--GnrcMan--

Re:Don't rely on the Compiler to Optimize by PigleT · 2000-01-27 01:48 · Score: 2

I think I agree... :)

There is indeed an art - if not a Zen - and a science to knowing when to optimize stuff, and I think it boils down to 'use the hottest algorithm around' (especially if you start in C), *followed* by 'hack the ASM by hand'.

Something I finally got round to doing just this past weekend, that I'd been meaning to do for a while, is it compare the speed of execution of loops that simply count between 0 and a big number, but using different comparison methods, and different compiler optimization levels in gcc, with Perl for the hell of it as well.

The full source I used follows:

#include
#include
#include

#define INNER 5000000
#define OUTER 200

void up(void), down(void), xor(void);

int
main (void)
{
int c;
time_t t0, t1, t2, t3;

t0 = time (NULL);

for (c = 0; c Mosix cluster, which was "interesting", to say the least... ;)

As for perl, well that behaved the way I expected. The whole thing was way slower than C, but it did at least do an xor when I asked for it - that was almost invariably faster than counting up with "".

Well, those be the results... compile the source yourselves with all the various gcc options and compare the assembler, folks :)

--
~Tim
--
.|` Clouds cross the black moonlight,
Rushing on down to the circle of the turn

Re:What about the JIT? by BinxBolling · 2000-01-27 01:49 · Score: 2

I read that the 700 MHz Crusoe chip actually only attains the performance of a 500MHz Pentium

So there you have it - code morphing, or just in time compiliing or whatever gives you about 70% of the performance precompiled code.

Comparing interpreted to compiled code on a MHz-for-MHz basis makes no sense, because MHz is not an intrinsically interesting number. Most people don't care about what their computer's clock speed is. They care about how fast it runs applications (i.e. MIPS), and how much it costs. So don't compare Crusoe to an Intel chip on the basis of a MIPS/MHz ratio. Compare it on the basis of a MIPS/$$$ ratio. How well does a Crusoe perform compared to an Intel chip *of the same price*?

Re:Don't rely on the Compiler to Optimize by PigleT · 2000-01-27 01:50 · Score: 2

Aaaargh!
CmdrTaco, your preview mode is broken. That worked for me before.

In any case, the source is available here.

--
~Tim
--
.|` Clouds cross the black moonlight,
Rushing on down to the circle of the turn

Re:What about the JIT? by BinxBolling · 2000-01-27 01:51 · Score: 2

Comparing interpreted to compiled code

Complete brain fart. That should read "Comparing Crusoe to Intel..."

Re:Doesn�t work in practice by cje · 2000-01-26 23:57 · Score: 2

Its wrong to have the most basic data type wobble over the bit length.

No. This simply wrong, and almost humorously so. What are you going to do? In the C standard, are you going to say "Implementations shall make int 32 bits wide?" What do you do on a machine (say, an embedded system) with 9-bit bytes? What about a Cray YMP? The goal of the C standard is to have the implementor pick representations for the basic types that make sense for the target machine. Performance is nice, but in the end, they have to make sense. Now, there are some limits in the form of minimal ranges that are placed on the "basic" integral types, but that's about it.

The solution would have been to have the basic data types with fixed size, and additional data types with "some size between 16 and 64 bit giving the optimal performance on the target machine".

You should be more than pleased with the next revision of the C language standard, C99. Among other things, it standardizes such types as int16_t and int32_t, which provide exactly the functionality that you seem to think is important to portability.

This would have saved many people a lot of grief...

What would have saved many people a lot of grief is if they would have learned C correctly in the first place. They would then know what it guarantees and what it does not, and how they can use the language facilities (i.e., typedef to satisfy their non-portable desires.) The amount of junk I see in code written even today is staggering! (assumption that short is 16 bits, long is 32 bits, etc .. casting the return value of malloc(), etc ..)

--
We're going down, in a spiral to the ground

just plain wrong (me this time) by TheDullBlade · 2000-01-27 11:13 · Score: 2

Hmm... That bites.

If you ask me, it is a design flaw. Longs are only required to have the range of a 32-bit number, so it is wasteful to include more bits. A lot of portable code will have unnecessarily bloated memory use and any reliance on the extra 32 bits throws portability out the window. Given that, a non-portable extension for 64-bit ints (since full use of 64-bits is inherently non-portable; with a lot of sizeof checking some use is possible, but not convenient) to be used only for system specific optimizations would probably be a better choice.

Given that most code run on Alphas probably isn't written with a 64-bit long in mind, it almost certainly hurts more than it helps (especially with memory access being the main bottleneck of modern systems).

The next C standard should fix this problem by providing a 64-bit int (if you ask me, they should just allow you to stack longs, doubling the bits of the minimal range each time; writing arbitrary bit size emulation into the compiler is trivial, though the trivial implementation is inefficient, and it would be a permanent fix that could have surprising benefits for certain supercomputer applications), as well as the addition of fixed bit size types (which is, I suppose, good for networking code, though I fear people will abuse it; IMHO it's better the way it is where you should chop everything up manually into whatever format the protocol takes).

--
/.

Re:just plain wrong (me this time) by TheDullBlade · 2000-01-28 02:55 · Score: 2

"64 bit operations on an alpha are no slower than 32 bit operations, so it's not really being wasteful."
Think of the memory consumption (if it's not being wasteful, why then did you claim that in this case using long was sub-optimal?).

"IIRC, C requires long to be large enough to hold a pointer. On non-NT alpha, that's 64 bits."
IIRC, that's an old K&R thing and in ANSI C, pointers and ints are completely different animals and can never be inter-converted. Or maybe that's just the upcoming new ANSI standard.

"Why use a 64-bit chip if you're only ever going to use half of it?"
Indeed, why use a 64-bit chip at all? (I don't like the current trend of building bigger and bigger chips; we should be learning to make more and more of them work together) But more on topic, very little portable ANSI C code will make profitable use of a 64-bit long (yes, you can write code that checks just how big it is, and do, say, bit masking operations on more bits in parallel, or write a more efficient BigInt library). If you are going to write non-portable code, why even bother sticking to the standard? I like the "long long" solution that some compilers use; for one thing, it immediately tells any other compiler that this code isn't portable ANSI C.

--
/.
Re:just plain wrong (me this time) by TheDullBlade · 2000-01-28 09:29 · Score: 2

"I don't remember saying that"
Sorry, you didn't, that was AndrewHowe. I wasn't keeping track of who is on this thread.

"I'll take your word for it"
Don't take my word too seriously, I don't have the C spec either. I'm going on what other people have said about the standard. And like I said, it may have been about the upcoming standard, not the present one (you've gotta love it, it's finally been around long enough for people to comply and then they go and change the thing).

"However, without an integer type big enough to hold a pointer, how are you going to compare a pointer to another using operators like do wish that methods of handling larger ints were specified; I mean, even the x86s have special features to handle 64-bit ints, which you just can't access using ANSI C). Actually, I'm rather fond of the idea of 24-bit chips. Chuck Moore thinks 20-bits is enough, and I really respect him, but he also thinks it's an acceptable tradeoff to use a character set where ones and lower case Ls are the same (I really recommend reading around his web site, while he might be a little over the edge, he still has a lot to teach the rest of us).

The fewer excess bits you use, the smaller and faster (not to mention cheaper) the chip can be.

"So that the code can run on 99.9999% of the machines someone may want to run it on, instead of, say, 85%, or 50%, or 1%, depending on how far you deviate from the standard and what assumptions you make?"

Nothing that depends on a 64-bit long (and very little that can benefit from it) is going to run on 99.9999% of machines (it will, in fact, be broken on many 64-bit machine compilers), which was my point.

--
/.

Re:Comparing JIT Misses The Main Point ! by PhiRatE · 2000-01-27 05:39 · Score: 2

Somebody moderate this guy UP. He's hit the nail on the head with a huge sledgehammer, this is _exactly_ why code morphing is different, and far more spectacular, than the JIT concept.

Although I still want access to the morph layer to play :)

--
You can't win a fight.

Sun license? by PapaZit · 2000-01-26 21:37 · Score: 3

Doesn't Sun have a patent on hardware that can run java bytecode natively?

I wonder where transmeta-ish devices fall. I'd call it software, but would sun's lawyers?

--
Forward, retransmit, or republish anything I say here. Just don't misquote me.

Compilers dont write better code than humans by tjansen · 2000-01-26 21:38 · Score: 5

I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false. At least until compilers are able to understand the program. In every program there are things the compiler simply doesnt know. For example, in x86 assembler it is possible to save some cycles (and memory accesses) by using 16-bit or even 8-bit registers instead of 32 bit registers. You can double the number of registers by doing it. You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant. The compiler might profile the possible value range if it is a really advanced compiler (I never heard of any compiler actually doing this), but it cannot be sure so it must at least check the values before for the case that they arent.

As long as the programmer has more knowledge than the compiler, he will always find tricks to save an instruction here or there and outperform the compilers this way. You can find a great example of such programming tricks in the PCGPEs article about texture mapping inner loops here.

Re:Compilers dont write better code than humans by Hard_Code · 2000-01-27 02:15 · Score: 3

Right. I've always thought an ounce of design is worth a pound of optimization ;)

Leave the design to the humans...leave the optimization to the compilers.

Jazilla.org - the Java Mozilla

--

It's 10 PM. Do you know if you're un-American?
Re:Compilers dont write better code than humans by GnrcMan · 2000-01-27 12:59 · Score: 2

Incidentally, the optimizer hint you describe is called "__assume" in Visual C, and it does exactly as you say. So your code would be:

switch (x) {
case 0: ...
case 1: ...
...
case 23: ...
default: __assume(0); // Same as 'notreached'!
}

normally you'd have a define like:
# define ASSERT(e) ( ((e) || assert(__FILE__, __LINE__) )
#else
# define ASSERT(e) ( __assume(e) )
#endif

in order to hide this, but you get the point.

Which brings to mind an optimizer bug that I came across (and fixed) which had to do with a nested turnary operator with two assumes:

__assume(foo() && x==something),bar(&x) ? dosomething() : x == foo ? somethingwierd() : __assume(0);

That's complete nonsense of course but it had the same form. This expanded out to a huge affair in the intermediate code the optimizer uses (one of those embedded functions was inlined(!!)). It was fun. Oh well, I'm sure no one is interested.

--GnrcMan--
Re:Compilers dont write better code than humans by BinxBolling · 2000-01-27 02:36 · Score: 3
You need to understand the code fragment you are coding to do the must useful optimizations which are algorithmic ones. A badly coded better variant of an algorithm will beat well coded, but inherenetly slower version.
Absolutely. But one need not be programming in assembly to do the sort of algorithmic optimizations you speak of. In fact, in higher-level languages, where source code content is determined largely by the algorithm being implemented (rather than by the low-level details of the implementation, as in assembly language), you have a better chance of 'seeing' and successfully implementing a faster algorithm. This sort of thing is one of the big reasons why assembler loses in the long run.
It seems to me that there are two basic types of optimization task:
- Choosing the fastest algorithm for a particular task.
- Making a given algorithm run as fast as possible on a given piece of hardware.
Of course, this is a bit of an oversimplification, since it assumes that the fastest algorithm for a given task is going to be the same on all platforms. Usually, this is true, but probably there are a few minor cases where it isn't. But it works most of the time.
At any rate, humans are much better than machines at the first type of optimization. But machines are much better than humans at the second (even if only because normal humans don't have the patience required to do the enormous amount of repetitive analysis that it requires).
Re:Compilers dont write better code than humans by dingbat_hp · 2000-01-26 21:56 · Score: 2

16-bit or even 8-bit registers instead of 32 bit registers. [...] need to be sure that the values in the registers are below 65536 or 256 to use these tricks,

Isn't that trivial old-tech for compiler optimisation ? If your type system is strong enough to know that it's an 8 bit operand, then you're only ever going to need the 8 bit register.

You can argue that multiplying two int8 only really needs an int16 result if the initial values are big enough, which is fair comment in favour of human optimisation. OTOH, it's not a rocket science extension to the type system to allow an int4 type to be declared, even if it's always held in an >=8 bit register and the compiler can then safely optimise that to place the result back into an int8. That way you even improve the code quality, as there's no discrepancy between the valid and the apparently valid numeric ranges, as you get with the human optimisation and the non-obvious limiting of the initial int8 to be <0xF

Anyone remember BCPL ? 8-)
Re:Compilers dont write better code than humans by Junks+Jerzey · 2000-01-28 04:05 · Score: 2

No.
Re:Compilers dont write better code than humans by Mr.+Slippery · 2000-01-26 22:14 · Score: 4

You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant.
Isn't this why we have short int, long int, and char data types - so we can tell the compiler these things?

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:Compilers dont write better code than humans by AndrewHowe · 2000-01-26 22:40 · Score: 4

In some ways I agree with you, OK, I love writing assembler stuff.
But I'm trying to separate fantasy from the reality here.
The fantasy is that you're some super studly asm c0der who can not only produce properly scheduled code, but also has the time to *completely re-write* it:
a) For each new processor
b) For each sub-type of that processor with different internal resources
c) Every time the specification changes - Maybe you have limited the range of a variable or something.
Any of these changes will require a huge amount of work. Even a small change, coupled with the pipelined nature of modern processors, means a large modification to re-establish optimal resource usage.
The reality is that no-one has the time to do this, and even if they do, they should be doing something else instead. High level optimizations almost always beat low level tweaking.
In addition, I simply don't believe it's impossible to
a) Tell the compiler in more detail how to optimise your code
b) Have the compiler suggest ways in which it could produce better code (for example, "Is it OK to assume there is no aliasing of this item?" or "Hey, if I make this variable 8 bits in size I can do this huge optimization")
c) Have the compiler work this stuff out for itself
d) As (c) but at run time, guided by profiling information.
Some of those are hard to do right now, but I don't see anything that would make them impossible.
Does anyone?
Re:Compilers dont write better code than humans by Namarrgon · 2000-01-26 22:31 · Score: 3

It's just like Deep Blue vs. Kasparov again. Brute force compilers vs. intuitive & creative humans. In some cases, humans can write better assembly than compilers (thanks to knowing more about what the program is intended to do), and in others the compilers have the advantage (thanks to being better at scheduling instructions on today's pipelined, superscalar CPUs). JIT compilers could have even more knowledge about instruction timing, but have less time to think about it. Kinda like Deep Blue playing blitz chess. I'd say in most cases these days, a compiler could write faster code from scratch than a human could, unless the human really spends some time on it, but the optimal combination will always be a human assisted by a compiler (and profiler, etc). Best of both worlds. Namarrgon

--
Why would anyone engrave "Elbereth"?
Re:Compilers dont write better code than humans by GnrcMan · 2000-01-26 22:57 · Score: 4

That may be true of a crappy compiler on a lame platform (x86)...but I'd like to see you try reordering and slot scheduling Alpha instructions by hand to enable optimal pipelining and branch prediction.
As someone who worked firsthand on a compiler for the Alpha, I can tell you that for 99% of the cases, the compiler can perform better general optimizations than a human. That 1% left over is what you should be concentrating your efforts on.
Historically, the point of a CISC processor was to reduce compiler complexity. So of course hand coding optimal X86 assembly isn't going to be difficult.

--GnrcMan--
Re:Compilers dont write better code than humans by locust · 2000-01-26 23:01 · Score: 5

I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false.
The metric that I given in my university computer architecture class was that a compiler can generate better code than 90% of assembly programmers. Which, when I examine higher level code written by a variety of programmers, some good, some bad, makes sense. Some people know all the details of a language and some don't. Some pay attention and some don't. I'm pretty sure it was backed up by a reference to an IEEE paper somewhere.
If large numbers humans could reliably write large volumes of high quality, optimized code, at high speed, we'd still be using assembly everywhere. But they can't. I certainly don't have the time or the inclination to learn all the ins and outs of every instruction set architecture I operate on to squeze every last bit of juice out of the machine. And I certainly am unwilling to give up the expresiveness of a highlevel compiled language.
But ultimately, the fraction of people that need to write things like texture mapping loops is very small, and the fact of the matter is that the loop is probably embeded in some higher level piece of code. So relax, compilers reliably produce better code than the majority of assembly programmers (maybe not on /.). Or more precisely, better machine code, than the majority of programmers, if they were asked to write assembly. --locust

Compiled Once vs. Compiled multiple times by JWRose · 2000-01-26 21:38 · Score: 2

If I'm understanding what you're saying correctly,
a JIT compiler will compile the exact same code each and every time you go to run it. While a convential compiler will compile it one time. Yes the JIT compiler might be able to tune the performance better for each run, but I think it takes a performance hit at the start by having to do the compilation. While the JIT compiler is compiling code, the exact same code compiled with the convential compiler is already executing.

The faster the hardward becomes though, the differences between the two might not be as noticible. Although, the more powerfull the hardware becomes, the more complex the code becomes.

Nothing exists exept atoms and empty space; everything else is opinion.

--

blah blah blah....

Re:Compiled Once vs. Compiled multiple times by MikeBabcock · 2000-01-28 11:18 · Score: 2

Once you've implemented a caching interface _at all_, you abstract the maintenance of that cache and allow the maintenance system (thread, whatever) to deal with keeping MRU or LRU indexes, or whatever modern DB-based system you want to use for good caching efficiency. In fact, you could try several over time (in a second, how many could a 700Mhz machine try?) in a small section of your cache and evaluate efficiency of each, on average, for a given user -- that's the point of this Chip+Software deal anyway.

Upgrade the caching maintenance code later if you're finding its not good enough ... just requires flashing the bios :-)

--
- Michael T. Babcock (Yes, I blog)
Re:Compiled Once vs. Compiled multiple times by ucblockhead · 2000-01-26 22:50 · Score: 2

If you kept the entire compiled code around (and not just part of it), you could keep profiling info, and only attempt further optimization the Nth time a block was run in a certain period, were N is some sufficiently high number. You could also be really smart and save the compiled code so that next time you execute, you don't need to recompile everything.

Imagine a program that speeds up the more times you run it...

Perhaps you could set some sort of "optimization threshold", which limits the percentage of time the system tries to improve over time. The first time you run, there is the complete overhead of compiling. The second and subsequent times, a 10% slowdown for optimization. After a few runs, with the data collected, the performance improvement beats the 10% overhead and you are better off.

--
The cake is a pie
Re:Compiled Once vs. Compiled multiple times by MikeBabcock · 2000-01-26 22:32 · Score: 2
The intelligent JIT compiler would do one of several things:
- Cache the compiled code on the hard drive for each program executed.
- Cache the compiled code for frequently seen sections (like fast symbol tables for headers in C compilers).
--
- Michael T. Babcock (Yes, I blog)

Genuine Question: have you ever compared?? by orpheus · 2000-01-27 12:51 · Score: 2

It's been over a decade since I did any appreciable assembly coding (I quit after the 80286), but I have a serious question, and I'd be interested in serious answers:

How many of you actually routinely (e.g. 6 times a year) examine the output of your compiler, and compare it to what you would have written?

I've done it often (a promise I made to myself as a foolish youth) and learned quite a bit. It's actually changed my higher-level coding technique on a few occassions (and I'm not just talking about optimizing for a better fit with a specific compiler)

Not only have I never met another person who does it, but I've almost never met a programmer who didn't look at me oddly for even mentioning the practice. And yet, here are so many people walking as if they did it routinely. I guess that's just the Silicon Valley edge.

I'm not being sarcastic or snide. I really want to know. Maybe I should give up my current profession (ditching a decade of grad and post-doc training) and move to the West Coast, where Programmers are Real Programmers.

Or maybe I should finish reading my NEJM, and we should all just keep in mind what we really know, and what we only think we know. Again, no disrespect, Lord knows I've had to re-think the extent to which I "knew" things before.

Please post your honest answer, and encourage your friends to do so as well. I'm afraid I'll have to take a lack of responses as a confirmation of my worst fears -- that everyone is only guessing.

--

If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime

Re:Why Optimize to overkill? by mikera · 2000-01-27 02:12 · Score: 2

Well, a good optimization typically makes one type of instruction or sequence of instructions a fair bit faster (say 30-50%).

While this might give a 2% improvement across average code, it will make a lot of difference if this particulr optimization helps out your inner loop. There are times when you really need this.

The way I see it, if compilers can optimize effectively, then there becomes less need to hand-code an inner loop in assembly. Decent optimising compilers may not give you a great deal of extra speed but they sure as hell give the hardcore coder an easier life.

BTW, The rate of hardware improvement is a good argument *unless* you are trying to stay at the cutting edge. In these cases, optimizations are certainly not trivial.

possibly offtopic by speek · 2000-01-27 00:00 · Score: 2

Talking about dynamic recompilation and program profiling, and specifically about Hotspot, I've always wondered why I have to put up with the fact that between program runs, the compiler "forgets" everything it learned during previous runs? Why can't the compiler generate a log for each program, and review the log when the program is run again, so the compiler can pick up learning where it left off?

--
First, make it work, then make it right, then make it fast, then, make it bloated!

within X% of the performance of native code by The+Pim · 2000-01-26 21:40 · Score: 2

[slashdot bug reformatted the last post] As the microkernel unix people used to say before they finally died out, "We are within X% of the performance of the original". - Alan Cox The JIT guys can write all the papers in the world about how they should (yes, really should) be able to get native code speed. No, that extra level of indirection is really a performance benefit, not a hindrance! I'll believe it when I run Java bytecode and it's not dog slow.

--

The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.

Re:Profoundly counterintuitive? by bhurt · 2000-01-27 00:02 · Score: 2

First off, there are profound performance differences that can be acheived simply by optimizing for the Pentium Pro and not the Pentium, see The Twofish Implementations. This is mainly just instruction ordering issues- possible to do in a JIT. There's more to optimization anymore than simple cycle counting.

Second, memory management is a different beast under object oriented programming than under procedural programming. The holy grail of object oriented programming (as it were) is black box reuse- allowing a class maintainer to modify how a class works without those using the class needing to change their code. Not having garbage collection in an OO language either disallows true black-box reuse (which is a maintainability issue), forces the object to implement it's _own_ garbage collection (such as the C++ STL template basic_string does- and this is generally the worst sort of garbage collection, such as basic_string's reference counting), or forces the program to leak memory like a seive. If those are the options, then garbage collection is not too high a price to pay.

Especially when you consider that GC isn't that high of a cost. _Every_ memory management scheme that isn't static in nature that I've seen is O(n). Either on the allocation or on the free. Sooner or later you have to search for a hole. GC doesn't remove this penalty, but instead it allows you to _defer_ it until a better time. Since most programs spend most of their time waiting for input, there are lots of places where garbage collection can be free because the program isn't doing anything else.

Last, but not least, I'd like to point you at the The GC FAQ, a good resource for all things GC related. Highly recommended reading- after it, it was "intuitively obvious" that the world was flat, that the sun went around the earth, and that time didn't slow down as you went faster.

Why JIT is slow by 31eq · 2000-01-26 21:40 · Score: 2

When you run a Java program, the JIT has to do its thing before the program starts. That means the application takes longer to launch. For a lot of GUI apps, this is exactly where the greatest bottleneck is.

So the kind of optimization suggested probably wouldn't improve matters for the user, at least with the small, visual applications Java was supposed to be good at. However, it suggests the people using it for numerical work may not have been so mad after all.

If it works.

Of course, you could cache the compiled code, so that it launches better second time. Pretty soon, you'll end up re-inventing compile/run cycles.

Profoundly counterintuitive? by TheDullBlade · 2000-01-26 21:42 · Score: 4

Seems profoundly intuitive to me.

First of all, a JIT is rushed. You can design your optimizing one-time compiler to look at the code for better ways to do it all day, if you like, and while the developers will moan, they will still use it if gets them better results.

Secondly, Java has a very specific standard (that is typically fudged... but that's beside the point). It doesn't give any leeway for a program to act a little different in boundary conditions, like C does. In C, an int is whatever size int is most easily handled by the target system; if you need a certain exact size of int, you can code that in with some extra effort. In Java, an int is a Java int and Java doesn't care in the least what the most efficient native int format is.

Thirdly... automatic garbage collection is less efficient than hand coded allocation and deallocation, and dynamic allocation is less efficient than static allocation. There are odd cases where this is false, but they generally hold true, and where they do not, the C version can always use the more efficient method anyway. (and extremely fast calls to "new" can be easily achieved... at roughly 50% memory wastage)

Fourthly: Java locks you into a OOP model which is inherently inefficient (at least as done in Java). All function calls must be dynamically redirected etc.

I could go on, but it feels like trying to describe that water is wet and stones sink in it to someone who thinks it's intuitive that water should sink into stones.

However, I will not dispute that you can definitely get better results by recompiling for each chip that something will run on. Some JITs may use this to push out ahead of one-time compilers.

BTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?

--
/.

Re:Profoundly counterintuitive? by MikeBabcock · 2000-01-26 22:30 · Score: 2

I was with you until your last statements:

BTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?

The problem in your first statement is that you've defined three categories, not two. I would summarise as:

Good ASM coder + Optimising compiler
(is better than)
Optimising compiler
(could be better than)
Good ASM coder

After all, without the code produced the the assembler, the coder has to figure out the rest of the optimisations alone.

The second problem is with your question of problem solving. You're probably thinking on a philisophical bent, because you and I both know that software can be written to solve dozens, if not millions of problems faster than we can (digits of PI, etc.).

If it is possible for a programmer to reach 90% efficiency, but after 60% it requires doubling the project length to reach 75%, etc. but the self-profiling program can reach 80% efficiency in milliseconds, which is "better"?

--
- Michael T. Babcock (Yes, I blog)
Re:Profoundly counterintuitive? by BurdMan · 2000-01-26 22:31 · Score: 5
Firstly, water does sink into rocks. It then freezes and the pressure it exerts reshapes the world as we know it. But that is beside the point.

I will address your comments with respect to the Java Programming Language and its HotSpot compiler technology. If you would like to enlighten yourself on the techniques behind HotSpot take a look at http://self.sunlabs.com. Transmeta, IMHO, was influenced by the same concepts when designing its code morphing techniques.
1. "A JIT is rushed" - The first time bytecode is compiled to native code it must be done quickly to avoid delay. That code may be inefficient, true, but it can also be instrumented with profiler like information that can be used by later passes of the compiler. You see, what the author is describing is not a JIT system but a dynamic optimizing compiler which has the opportunity to study the program during execution and recompile parts or all of that program based on that information.
2. "no leeway for boundary conditions" - Well here again you seem a bit confused and your example is naive. The native representation of a type is not fixed in Java, only the conceptual representation. To be a compliant Java implementation all results generated using 'int' (to use your example) must be consistent with the conceptual representation of 'int' when using the various operations. If the optimizing compiler finds a way to represent an 'int' in a more optimal way it is free to do so as long as the results of the operation don't change.
3. "garbage collection is slower than hand coded memory management" - You really ought to read some of the latest garbage collection papers. This is no longer true even for collectors managing C or C++ allocation. When given an environment like Java which doesn't allow pointers and other antiquated memory techniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.
4. "OOP and dynamic dispatch are inefficient" - Again, I urge you to read the papers at the Self site listed above. The way the Self, and now HotSpot, compilers work eliminates this bottle neck. I'll admit in early implementations of Smalltalk and Objective-C dynamic dispatch did add a small amount of overhead. Take a look at the documents on the HotSpot here and then tell me that dynamic dispatch is a problem.
In summary, you are more than welcome to use assembly all you want. Code on my brother! But, please before you slam some other method try doing the smallest amount of research first. Maybe your snap intuition is wrong. You never know.

As far as Transmeta goes, it has a lot of the HotSpot/Self style technology and I personally think that technology is the future. I can't wait to get my hands on a Crusoe powered product.

-BurdMan

JIT is not the same as Language performance by J.Random+Hacker · 2000-01-27 00:05 · Score: 2

The problem that one tries to solve with JIT based language (say Java) is not the same problem that one tries to solve with a systems programming language (say, C) is not the same problem that one tries to solve with an AI language (say LISP).

I would not try to write device drivers in anything but C -- assembly is too unportable. I would not try to write an AI program in C unless I had an essentially unlimited budget -- LISP is much better suited to that task. If I want to interact with a WWW user -- Java is the best choice thus far. As to effeciency of the resulting program, the speed of assembly code is irrelevant in 99.99% of code, since the program spends most of its time in that 0.01% (OK -- I'm exaggerating, but not that much ;-). Ease of programming and maintenance is much more valuable -- that observation led to the decision to use C to implement UNIX in the first place.

"So what" you ask? While we talk about JIT and code morphing vs. compilation vs. hand assembly, we need to compare similar things. Language characteristics are nearly unrelated to performance (*implementation* is a whole other story...) In any HLL there must be some translation from strings of bytes to executable things, whether byte codes (emacs lisp), VM codes (Java), or machine code (any of the above under an alternate translator).

JIT tries to solve the problem of bridging the gap between one (abstract) instruction set to another. Code morphing does the same thing. Compilation does the translation from HLL to machine code in one block. The time available to a compiler to explore different register allocation strategies and branch orderings is much larger, but less information about the typical case is available than for the JIT. However, branch prediction is the only place where repeated JIT/morphing has an advantage.

BUT: I repeat -- they solve different problems. Compilers create programs to execute on a chip. JIT/morphing makes a program written for one chip executable on another.

The use/non-use of garbage collection does not tell us a-priori that a program will be slower than the same program without GC -- that depends on the allocation behaviour. There is considerable literature (both theory and practice) on this topic, for languages such as C (gasp!), C++, and LISP. Probably Java too, by now.

Compilers may miss optimization opportunities occasionally but the assembly code they generate is usually pretty good, and it is free of human-errors.

Finally -- Alan Cox mentioned the microkernel performance mistake. I think that it applies here, but not in the way you might expect. It is really easy to miss the real cause of performance problems. In my experience, choice of algorithm and data structure (even in OO code) have a much greater impact on performance than any other single factor (or even several other factors together). You should use language features to help with your task -- but don't be sloppy with them. That always leads to bad, buggy code, and poor performance.

Hmmmm -- still on topic? You decide ;-)

Cache misses and other nonlinear behavior by Frank+Sullivan · 2000-01-27 02:33 · Score: 2

In theory, a human can write better assembly than a compiler - in a vacuum. But code doesn't exist in a vacuum, and it becomes less of a vacuum every day. Consider these few issues:

* Cache misses: page faults that cause L1/L2 cache misses (going straight to memory, or worse, to disk) cost thousands, maybe millions of instruction cycles. Do you write all your assembly language to neatly break your data structures across paging lines? No? Didn't think so. btw, this level of optimization can be performed in C in many cases, by good C programmers.

* Pipelining: Do you hand-code your assembly language to give the best hints to the branch decision-making hardware so it can optimize pipelining? Really? You're better than me. Or maybe your code is so much better than compiler-generated crap that the branch lookahead hardware support is useless.

* APIs: I suppose you don't mind reverse-engineering every library call you might ever need, so you can avoid those wasteful, inefficient C headers.

* Et Cetera: The days where humans can outthink compilers in the general case died with the PowerPC and Sparc I. Those of you who have written 1,000,000+ line assembly language programs are free to tell me i'm full of shit. Those of you who have worked on million-plus line code in high-level languages are free to give me an AMEN!!

---

--
Hand me that airplane glue and I'll tell you another story.

Comparing JIT Misses The Main Point ! by FrankW · 2000-01-27 02:35 · Score: 2

Hmm, I think the author missed the REAL advantage of Code Morphing (or Binary Retranslation).

JIT Compilers (or even FX32!) are pieces of software running ABOVE the OS layer. They are executed for a specific application which runs on top of it (that is the application on top of the JIT environment).

Yes there are similarities between Code Morphing and JIT compilation, as each does some profiling to get better performance in frequently executed code areas. It is further true, that on more predictable and orthogonal platforms like the Alpha, all kinds of optimisation depend on knowing the underlying platform.

BUT they only see the code from the application running on top of them.

The processor itself however gets code from MANY SOURCES. The number of processes running on any machine today is quite large. In addition things as shared libraries, OS calls, etc add a pinch of code.
Hence the processor still has to perform all kinds of clever things, trying to resolve dependencies, avoiding stall and stalling at times. For modern multiple Issue machines this gets VERY complicated. It slows down cycle times, increases size (and hence speed) takes up space which could be used for better performing optimisations (larger cache, etc), etc.

This is one of the major headaches for CPU designers these days, as these parts increase unproportionally in complexity when trying to add more functional units, etc.

This is the reason why VLIW (very long instruction words) are used in the next generation processors (i.e Itanuim). In those the compiler does all the hard work, resolving all dependencies stalls, etc, at compile time, and then simply sending one long instruction to the processor, which includes one instruction for every functional unit. This means the processor can run at MUCH Higher speed, as it loses a big overhead complexity.

This is the reason why TRANSMETA uses such a VLIW, it enables them to build fast, small and CHEAP processors which can run at quite high speeds. (I am sure they could do more than 500Mhz, but then cost and power consumption get in the way. And the are much smaller and havent (yet) got the man resources as Intel, AMD)

Now where does code morphing come into play ?
I recently saw a few comments asking for access to the VLIW instruction directly, bypassing the code morphing. Well, that would DEFEAT the concept !

The Big Thing about code morphing is that it sees ALL instructions which get executed, in the right order. This includes the library calls, the OS routines, etc. Hence it can do MUCH BETTER optimization than ANY Compiler by definition can do. A compiler can only know about the program it compiles (yes you can do clever profiling, but you still only have the application source to optimize), it cannot optimize ACROSS APPLICATIONS.

The code morphing software has absolut knowledge about what code has executed and what code currently tries to execute. This enables it to do much better optimizations.
Furthermore it is SPECIFIC to the underlying processor. It knows everything about the processor and even more important, the processor is designed specifically for it !
(Hence the two versions are internally comletely different, each optimised for the kind of code linux/Win it is executing)

This is what makes the concept so interesting. It adds an additional abstraction layer on top of the actual hardware, allowing the underlying implementation to be always up-to-date and optimized for the application (removing all the complexity of having backwards compatible hardware, ask Intel ;-)

Overall, the approach is a MAJOR step forward, IMHO it can be compared to the CISC->RISC transition. It is after all the first implementation, competing with products well understood and optimized for years. It has potential to solve many of the major headaches of modern CPU designs (including COST). Comparing it to JIT is missing the BIG points.

I think it will go a LONG WAY !

Frank

about Java by jilles · 2000-01-26 21:45 · Score: 4

Hi,

What transmeta does is very similar to SUN's hotspot compiler for java. Like Transmeta's code morphing, hotspot compiles and optimzes bytecode on the fly using profiling data to optimize the parts that are executed most.

Your question as to why Java is still slower than C++ is answered in a series of javaworld articles (www.javaworld.com): http://www.javaworld.com/javaworld/jw-11-1999/jw-1 1-performance.html and http://www.javaworld.com/javaworld/jw-12-1999/jw-1 2-performance.html and http://www.javaworld.com/javaworld/jw-02-2000/jw-0 2-performance.html

Basically the problem is that stuff such as memory allocation, garbage collection, synchronization and runtime type checking have a performance price (you get a safer programming environment in return). The articles discusses these performance issues and give some usefull advice to avoid these bottlenecks.

--

Jilles

Not so obvious by Morgaine · 2000-01-26 21:47 · Score: 2

There's local overhead in the instrumentation code, but not a net overhead in the combined instrumentation code plus controlled loop code: the loss due to extra code prior to a loop is far less than the gain resulting from the more efficient loop code produced by introducing that extra instrumentation.

--
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra

Re:Not so obvious by stevew · 2000-01-27 03:02 · Score: 2

Jilles,

Beating a modern compiler is easy...if you have the right backround. For people that came into programming when 16KBytes was ALOT of memory, assembly coding in an efficient manner is like breathing air. For folks that grew up in the era of 16Mb isn't enough to boot my OS, well... hopefully you see my point.

Modern compilers are marevelous pieces of technology...and I've worked with some of the best folks in the field at one time or another (most of these folks are off working at HP or Intel right now on Merced me thinks..) Those same folks could write better code than their compilers produced.

Note - The Merced and it's breathern are considered decendants of the VLIW machines built in the 80's. Alot of the same people are working on Merced that were the engineers at those VLIW startups.

--
Have you compiled your kernel today??

Ultimate optimisation needs better hints by dingbat_hp · 2000-01-26 21:47 · Score: 3

when the programmer has decided that space is more important than speed.

That would seem to be one of the issues where manual coding has the edge. I agree with your general point, but I think there's still work to do before auto-generation is perfect.

Imagine a case where an algorithm could easily optimise either way speed/space - maybe there's a hash table that's going to hold some programmer-controlled depth of the initial search and allowing it to expand would make usage quicker, but eat memory. A human coder would probably know the pattern of usage this routine would get. By knowing the practical number of data items to be encountered, they could optimise the hash size. A compiler can't do this, because the necessary information comes from the overall systemm domain, not just the source code. To allow compilers to compete effectively, we'll need much more subtle optimisation hinting than we currently have; especially that of the form "ignore this block, it's only used once" and "speed like crazy here, and you'll never have more than 5 items loaded simultaneously".

I'm not a compiler / assembler geek, so maybe someone is already doing this ?

Re:Isn't this REALLY the smarter compiler arguemen by stevew · 2000-01-27 02:48 · Score: 2

I'll just say that when the compiler and the machine are already married together, the compiler DOES know about the size of the caches. I'm specifically talking about VLIW hardware as oppossed to a super-scalar machine that has multiple execution units. The main difference between these two classes of machines is that the super-scalar machine has a hardware layer that is doing the instruction scheduling where a VLIW box has the compiler doing that function. Thus I feel your observation is more of an apples and oranges situation.

--
Have you compiled your kernel today??

False figures by joss · 2000-01-27 16:43 · Score: 2

Where the hell did you get these figures ?
What compiler, what OS, what architecture.
They look *completely* out of line with every
other test I've seen. Check out the JGL against STL comparison above for some real numbers.

For a start, what the hell is the difference between integer division in C or C++ - no compiler that I've heard of written in the last 5 years produces different code in these two cases.

Secondly - who remembers to turn on optimization: anybody who cares about performance and isn't completely stupid.

--
http://rareformnewmedia.com/

Real world?!?!? by marcus · 2000-01-27 06:59 · Score: 2

>I can say that they do not offer Java in the
>CS department. This is a reflection in the fact
>that Java is not important for what programmers
>actually do in the world.

Man have you got to get out in the real world! You and your profs that is. Right now Java is my job. A year ago I was coding c and asm on an 8051 derivative for an embedded project. Today I'm building a distributed network management system for a company that builds fiber backbones. Guess what the most universal platform for executing code is? I'll give you a few clues: All their winXX boxes support it. All their *nix boxes support it. Any ideas? It's called a web browser and everyone that might be interested in designing, configuring, maintaining a network, every one of their laptops, desktops, and workstations has one.

Sooo...which platform do we support? What language do we choose? You only get one choice. I suggest that you transfer to a, uh, more modern school ASAP.

--
Good judgement comes from experience, and experience comes from bad judgement.
- W. Wriston, former Citibank CEO

The Final Word by spacewalrus · 2000-01-27 07:15 · Score: 3

This whole question boils down to whether or not a human mind is deterministic. We can prove that a
compiler is, or in fact that any program that runs on current computer hardware is.

If a human mind is deterministic then it follows that a human could do just as good as a compiler simply by using all of the same algorithms to generate the code, although the computer would certainly finish alot sooner.

If a human mind is NOT deterministic then it is clear that a person who had no programming skill whatsoever could beat the best compiler given enough time, a specification sheet of the language he was to generate the given algorithims in and a circit diagram of the cpu he/she was coding for.

This is VERY simple to see. A compiler can NEVER
under any conditions for any reason EVER EVER EVER produce better code than a human because the
human can ALWAYS produce the same code that the compiler did. The question should be is it reasonable to wait the 150 years of time it would take me to compile mozzila by hand or should I build a compiler to do it in a few hours. The next question would be how much time(if any) should I spend hand optimizing the compiler output.

A question I have is why cant we do the same thing that a JIT compiler does before runtime? It seems
clear that you could easily get very close to as much optimization from a multi pass profiling compiler as you ever could with a JIT without the runtime overhead.

Later
Spacewalrus

Re:No. Hand coded assembler is faster. by mikera · 2000-01-27 02:59 · Score: 2

Hmmm.... back in the days when I used to dabble in a bit of demo coding I would frequently benchmark my 486 code against the (Borland) compilers.

I almost inevitably won. Hands down.

Compilers may have come on a bit since then, but I'd still fancy my chances. Of course, I would only dream of doing this kind of optimization for the most critical inner loops. Most of the time it isn't worth it.

But I would like to stress the point that a human with good knowledge of the system and the problem at hand should be able to beat a compiler in many cases. Compilers can't think laterally.

For example, consider a function that must fill an array of bytes with consecutive numbers between X and Y.

Compiler produces beautifully optimized loop that increments a counter, moves an array index and sets the bytes appropriately. If it's clever, it might use the same register as both an index and a loop variable. No problems. Pretty fast, but....

Joe Hacker takes one look at the problem, sets a 32-bit register to contain the bytes X, X+1, X+2, X+3 and does a quarter of the number of loops just adding 04040404 hex to the register each time.

Joe wins, because he *knows* that most of the time X and Z will be sufficiently far apart for his bit of hackery to be quicker. His knowledge of the nature of the problem will beat the compiler.

This is the kind of thing that humans will be better at for the time being. Once AI compilers start to make an appearance, things may change. But right now, I'll bet on human intuition any day of the week.

This is so true, but it's getting harder by Dominic_Mazzoni · 2000-01-26 21:57 · Score: 3

I completely agree that humans can always write better assembly than compilers can generate, but it's getting harder.

Not all that long ago, compilers weren't all that smart, and processors were a lot simpler. Today processors use tricks like out-of-order execution and branch prediction, making it extremely difficult for an assembly-language programmer to determine exactly what is going on. It still remains the case, though, that the programmer may know something about the code that the processor doesn't (that a register will never contain a certain value, or that some execution paths are much more likely or more critical than others) which will give the programmer the edge.

None of the just-in-time compilers that I am aware of recompile code as they go. If the Crusoe actually attempts to optimize sections of the code based on runtime profiling information, this could be the main reason it performs so well. There have been academic papers written on this idea before but this would be the first implementation I've heard of.

here's how you do it by hoss10 · 2000-01-26 21:58 · Score: 3

I was about to try and answer your question but it's bloody difficult to explain!
With garbage collection implemented in the best way the JIT or whatever doesn't try to hard to keep a record of what mem has been allocated.
But, whenever the system isn't paricularly busy or if it panics due to low memory the JIT can traverse all the objects that have been allocated (just starting with each of the Thread objects I think) just by checking the references each object has until it has a list of all the objects in the system. There are a few reasons this is more reliable (not just faster) than maintaining a reference count.
Then I suppose it (the JIT) could ask the OS to rearrange it's memory mapping thingies (Page Translation whatchamacallems) to put any of it's user space pages after the current position of the stack or something like that (hard to describe but i know what i'm talking about (even though i can't remember the name exactly!))

Basically it can be made to work but maybe not particular efficiently because it could result in a full page (4K) been used for just one small object but this can also be fixed

Isn't this REALLY the smarter compiler arguement? by stevew · 2000-01-26 21:59 · Score: 5

Years ago I worked at a company that was going to bring VLIW to the commercial world. One of the things they had to invent was MUCH smarter compiler technology. That was in 1985, so the basic ideas involved here are at least that old.

Anyway, when the compiler DOES know the hardware, especially VLIW machines, they can indeed do a job as GOOD as a human. (Look - they won't usually do better jobs, when the guys who designed the machine write code, they're going to do what the compiler is trying to emulate..) But from what I remember, static profiling was considered "good enough" back then if the VLIW machine is built right.

Most codes spend most of their time in inner most loops. If those loops can be rolled up correctly for VLIW you can be executing different interations of the loop within the same instruction slot. Being able to do this is usually the largest performance payoff. If you have a GOOD compiler in the first place, that KNOWS the hardware, then JIT and morphing aren't needed, or help.

The place that I see the morphing being a step forward is that VLIW machines were considered architectural dead-ends until just now. If I use one of these smart compilers for a machine with say 7 functional units, the code that emitted won't work on a machine with 8 functional units, or at least not be optimized any longer. These machines didn't scale well until now! Code-morphing technology completely removes this limitation!

--
Have you compiled your kernel today??

a point about malloc/free vs. garbage collection by TheDullBlade · 2000-01-26 22:02 · Score: 2

Actually, there is no absolute requirement that free() does all the cleanup work that the allocator, any more than a garbage collector must call a routine to do all of its cleanup work at the end of every block and at every assignment.

free() could just add the freed chunk to a list of chunks to be cleaned up. A second thread could run continuously tidying up all these chunks with the same system-wide view as a garbage collector.

--
/.

Considerable problems by Anonymous Coward · 2000-01-26 22:02 · Score: 2

I can see two major problems offhand:

The first is that java is, conventionally, a
language which runs in a virtual machine. The
infamous JVM. This involves a layer of
inefficiency from the word go. Granted, you could
avoid this problem by compiling direct for the
CPU in question, but this is a nontrivial
conversion from a stack-based language to a
register-based architecture. If you are on a
java-optimised CPU, compiling direct, this could
still be pretty good.

The second objection involves the compiler.
Compilers take CPU cycles. The smarter they are,
the more they take. If you compile beforehand,
this of course doesn't impact execution speed.
If, of course, you'd like to compile on the fly,
well, you're going to have to execute the compiler.

Java also isn't really all that suited to this,
to be frank. If you want to see a REALLY powerful
model for this sort of thing, try using a FORTH
compiler. On a CPU which is at all stack-friendly
with a good execution environment you can pull
all kinds of nifty tricks (on the fly modification
of running code, assembler-like low level activity
and so on) which can really speed things up
immensely.

java/jit speed vs asm by TummyX · 2000-01-26 22:03 · Score: 2

So while it might seem profoundly counter-intuitive can anyone actually give me a good reason why Java + JIT should be slower than Good Programmertm + Assembler?

There are many things that slow java down, that I'm sure you're aware of, but I'll outline some of them here as they come to me :).

Limited stack, all java objects are created on the heap, and thus is considerably slower. Microsoft Research actually have a division working on speeding up java by putting some objects onto the thread stack.

Runtime checking, array bounds checking etc can not be turned off. Good for stability, bad for speed.

Late binding, can be rather slow :), it's always best avoided when using java, but sometimes you can't. This affects other languages too ofcourse.

Coding practise. I don't know about many people here, but when I program in C, I usually know exactly what's going in memory, so I spend a bit more time optimizing this and that saving a few extra cycles, especially if the code is in a loop. I'm sure this is even more true of most ASM programmers. In java, I just don't care, I don't really know what's going on in memory, and I don't really care, I just do what makes the tidiest code, I also tend to use inheritance a bit more :) hmmm.

Also, java's thread synchronization is easy, but extraordinarily slow compared to what you could do if you were to use Win32 for example, since you have a lot more control. A JIT is not going to help that. Come to think of that, that's where JIT can fail with high level languages, there are things that these languages restrict you from doing, which prevent such cool optimizations that a JIT couldn't possibly do - I'm not talking about instruction level optmization, I'm talking about things like being able to manipulate memory directly.

The idea that Java + JIT can be faster than traditional ASM is interesting, but never will happen I'm afraid :P. It's nice throwing around O notations about how much more efficient garbage collection can be etc, but you have to realise than you're assuming that memory freeing is the most costly process, it may not be. With JIT, GC etc, you have the additional overhead of the JIT and GC. I haven't seen any proof so far that these overheads aren't overheads at all.

There are additional 'optimizations' that can be done at runtime, but does using a JIT really improve performance (taking into account the overhead of the JIT)? How many cases are there when a compiler can not optimize the code as much as a JIT, and the JIT can actually give any kind of improvement. I doubt there are many, and unless the improvements are like 10% improvements, I doubt it would matter much. What's the diff between a 433Mhz and a 476Mhz celeron? Nothing any average user would notice. The only place you would notice the speed difference is when you're running something like VMWare, 3DSMax or compiling Linux....and in those cases would a JIT improve or just hinder performance?

Ofcourse, as we get faster, and faster processors, we can move to higher and higher level languages like Java. They drastically reduce development times, and in many cases, will be the better choice. I doubt you'll see Quake or any OS written in Java anytime soon....hell, I doubt you'll see Office or CorelDraw written in java anytime soon ;).

Crusoe is fast and super cool, but it's not as fast as a real 666Mhz x86. And if Transmeta adds official java support, I'd love to get my hands on one :)~.

I should also stress here, I have nothing again'st java/jit etc. There're all cool stuff, but I think the concept of Java beating good programmer + asm a bit, uh, unrealistic :).

Transmeta Speed by mohaine · 2000-01-26 22:03 · Score: 2

If one Transmeta CPU, running the morphing software and actual productive code, is on par with a PIII 500 Mhz, what would the speed improvements be if you used 2 CPUs? One dedicated to morphing, and one dedicated to productive code? Design would be almost as easy as current design, and speed could double.
Best part of this approach is that it still appears that your system has a single CPU. Win9x will still work.

--
(appended to the end of comments you post, 120 chars)

JIT are compilers by gmarceau · 2000-01-26 22:04 · Score: 2

Almost everything that a compiler need to do to optimize code structure un undecidable. Human brains kick-ass at undecideble problems. Almost anyting that pertain to instruction scheduling is NP complete. Human brains kick-ass at NP-complete problems

Once a programmer has infered some property about his program, he needs a highi-level language to express his solution to the compiler. ML's typing system is the best thing I know that acheives that.

Once a program has figure a fast instruction ordering, he need a low-level language to express his solution in. We all know what kind of language does that.

But I don't know any languges that combines both.

As for todays' question : if JITs looks cool, and one time compiles looks boring, think of compiling n-times with a profile session in between and you will get much better result. (all the sugar, no r.time JIT overhead)

-

--
This post was compiled with `% gec -O`. email me if you need the sources

Re:JIT are compilers by scheme · 2000-01-26 22:36 · Score: 2

Human brains kick-ass at undecideble problems. Almost anyting that pertain to instruction scheduling is NP complete. Human brains kick-ass at NP-complete problems

People aren't really kick ass at undecidable problems. People aren't able to decide whether certain programs halt or not. If the program is trivial then a computer can decide whether it halts or not probably with the same effectiveness as a person. The undecidability problem just says that there is no algorithm that can figure out whether all programs halt or not. An algorithm may be able to decide whether a certain subclasses of programs halt or not.

In addition, people most certainly do not kick ass at NP-complete problems. To solve any problem, we need an algorithm to do it and all such algorithms are nonpolynomial. If anything people would be slower at solving these problems since we can't compute things as quickly as computers.

--
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it

Experience from Optimizing Java ... by joe_fish · 2000-01-26 22:05 · Score: 3

The theory looks sound, but one of the biggest things to do when optimizing any Java program is to minimize the calls to new().
I recently sped a program up by 150% in a snap simply by killing a few new()s. Swing and LotusXSL have had similar experiences.

I think that part of the problem is that all of this is new, so there is more to do. HotSpot the trendy JIT from Sun in places IS ALREADY FASTER THAN C, but whenever it comes to Object creation, things slow down a lot.

So why in theory should new() be quick when in fact it is slow? IMHO the problem is not with the memory claiming, its with all the other stuff that the JVM has to do.

When I call new Foo() the JVM:

Checks to see if the bytecode for Foo already exists, and if not it loads it, verifies it, and calls the class init method.
This is very very slow, but should only be done once.
Allocs new memory. Probably very quick
Calls the hierachy of constructors of all Foos superclasses. Quite slow.
(For advanced garbage collectors) Place the object on the 'recent items' list. Probably quick

So I guess the complexity of the system as a whole is the problem here.

--
DWR is Ajax for Java

Re:Experience from Optimizing Java ... by ucblockhead · 2000-01-27 10:57 · Score: 2

Yes, but the key is that if you allocate from the heap, you have to search some sort of free memory chain to find something. This takes time. To allocate something off of the stack, you just increase the stack pointer.

If you do benchmarks, you'll find that this (At least, you would have last I checked, which was with an old DOS compiler):

Foobar f1(1,2,3);

is dramatically faster than this:

Foobar* f1 = new Foobar(1,2,3);

for the reasons mentioned above.

This is because the former looks something like this in psuedocode:

f1 = StackPtr
StackPtr += sizeof(Foobar)
f1.Constructor(1,2,3)

while the latter generates something like:

Start at beginning of free block chain
for each block
is it big enough?
if so, is it too big?
if so, split it and return beginning
otherwise return it

f1 = returned block
f1.Constructor(1,2,3)

Where this really gets compelling is if you a class has member classes which have member classes, if those members are all allocated, you end up with lots of little chunks of memory. On the other hand, if they are all regular members, and not pointers to things that are allocated, they just become part of the parent object's size.

If you have this:

class A
{
class B {} b;
class C {} c;
};

then this line gets memory from the stack merely by incrementing the stack pointer by sizeof(A):

A a;

This is basically one assembly op which, unless there is a page fault, is nothing, timewise. On the other hand, this:

class A
{
class B {} *b = new B;
class C {} *c = new C;
};

Created like this:

A *a = new A;

Requires three searchs through the heap to find memory. Not only will this take more time, but this is more likely to generate a page fault, both because you might not find contiguous memory, but also because each heap object has overhead associated with it (the list of free and used memory blocks) while each stack object usually does not.

(The other reason to prefer the former sort of class is that even if you decide to use the heap for the owning class, you use a bigger chunk, allocating only once for the class rather than for each memory object.)

--
The cake is a pie
Re:Experience from Optimizing Java ... by ucblockhead · 2000-01-26 23:02 · Score: 2

One huge advantage C++ has here is that you can declare on object on the stack. This makes your fourth bullet go away and turns the second bullet into something like two asm instructions.

(And, of course, there is no bytecode to load, so the first bullet becomes merely "call the constructor", whether the stack or the heap is used.)

Even in C++, the heap is overused. With moderately careful design, the majority of objects can be left on the stack. I'm convinced that one of the reasons that people think C++ is so much slower than C is that many C++ programmers don't think about this enough.

Calls the hierachy of constructors of all Foos superclasses. Quite slow.

What strikes me here is that part of the problem is these huge object hierarchies that people seem to love so much. A very flat hierarchy reduces this quite a bit.

--
The cake is a pie

Sounds good, but... by randolph · 2000-01-27 09:01 · Score: 2

I once took a course with Dr. Alfred Aho, one of the authors of The Dragon Book and a contributor to lex, yacc and awk (he's the 'a' in awk.) Dr. Aho would never call a compiler component an "optimizer;" instead, he called it a "transmogrifier," because the only way a compiler could produce truly optimal code would be if it could write an optimal program--and it can't.

While the compiler has the opportunity to rewrite the code every time it processes it, it's not very good at rewriting code, so its abilities are limited.

A well-designed compiler on a well-designed CPU can produce very good assembler code--I doubt I could outsmart a compiler in a RISC assembler optimization problem--but I believe that code tweaking is usually of value only in inner loops and for certain kinds of data storage.

All of which adds up to the reality that, pending the arrival of artifical intelligence, human programmers are still needed.

The C feature with "int" by TheDullBlade · 2000-01-26 22:08 · Score: 2

The basic data types are defined. You can use sizeof() to generate 100% portable code when you really have to know exactly how bit your ints are. More importantly, you can use your noggin and avoid it being a problem in most cases by writing code that accepts the range of possible value limits.

Embedded systems programmers will back me up on this one: it's a Good Thing that when you write C you can write one piece of code that's perfectly happy to use a 16-bit, 20-bit, or 32-bit int and it will be equally efficient on any of those systems.

--
/.

Re:The C feature with "int" by ucblockhead · 2000-01-26 22:29 · Score: 2

The trouble is that you can't always use sizeof() and you can't always change your types around. The classic case (which I ran into, painfully) is when you have two different machines with two different architectures passing structures around.

It is easy enough to define a structure like this:

struct
{
int Foo1;
int Foo2;
};

and yes, regardless of the machine, this will compile and run most anywhere, assuming that Foo1 and Foo2 are never larger than 2^15. The trouble is, if you are using that structure two places, one on an 8-bit machine, another on a system that is being upgraded from a 16-bit OS to a 32 bit OS, well, you have pain. (An unfortunately all too real situation I got stuck with.)

Now in my situation, you can certainly say that the idiot authors should have used shorts (and they should have), but even then, there is no guarantee that a short will be the same size in both places. I foresee a lot of pain when sizeof(long) stops equalling sizeof(int) as we move to 64 bits.

So what you end up doing is having the structure defined twice, potentially differently, for each side of the pipe. Not fun to maintain.

(And don't get me started on packing.)

--
The cake is a pie

Self-Modifying Code by Tune · 2000-01-26 22:10 · Score: 3

I believe Andy has a good point in arguing that there's no fundamental reason for manually written assembly being better than automatically self-optimizing stuff. I also believe that manually written assembly will ultimately become obsolete.

Nevertheless I'd like to point out that JIT is based a somewhat dangerous technique: a program that alters (its own) code. I believe this technique was used in the eighties to scare off hackers by making code incomprehensible and hard to disassemble until the program was actually running. Also (even on a good old 6502 processor) it's possible to make some speed improvents peeking and poking into the code you're actually executing.

When compilers for Microcomputers got faster and most processor architectures (known as Harvard architecture, I believe) explicitly require a division of RW and RO memory segments, self-modifying code was abandoned. Hacking into your own code is generally conceived as bad-programming-practice.

I was wondering wheter JIT techniques also require very intelligent (and thus "heavy") disassemblers? One might also expect that developing a JIT compiler is a lot harder than doing a conventional one (without peep-hole optimization). Does anyone have experience in these subjects?

NNNG! "that the allocator" . " would do" by TheDullBlade · 2000-01-26 22:14 · Score: 2

I hate when I skip words.

--
/.

Cause or effect? by DragonHawk · 2000-01-27 09:49 · Score: 2

The point is that the average application has inflated enormeous over the years.

So has the average machine. Which is cause and which is effect? Do we have bigger applications because with today's faster processors and bigger memories, throwing more hardware at the problem is cheaper then trying to bum five bytes out of a 50 KB program?

Something to think about, anyway.

--

dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.

I think Linux does this by DragonHawk · 2000-01-27 10:04 · Score: 2

Perhaps another malloc comes along wanting the same size (roughly) as the thing just freed. We just give it the recently freed chunk. We then don't have to build a new one.

IIRC from what I have read about the Linux kernel, it does something very like this. Blocks of free memory are kept track of in a hash table, where the hash key is the size of the block. When you free a block, it gets merged with any adjacent blocks and put in the appropriate spot in the hash table.

Or it might have been an array of lists of blocks, where each element in the array represents a list of blocks twice as big as the previous element's list. It's been awhile; I forget. :)

--

dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.

Bless you! by aheitner · 2000-01-26 22:20 · Score: 3

@#$% theory.

I can write fast C (or C++ if you stay away from the evil slow features) code.

Everything I do in Java is bloody slow.

I do believe fundamentally that some sort of per-machine compilation is in fact ideal -- the C compiler could do clever stuff if it new the exact cache architecture of your machine and compiled your programs from some intermediate (say parsed and desymbolized) representation when you ran them (or maybe just the 1st time you ran them).

(all the slackware people in the crowd say "bwaahahahaha! I did compile my programs for my own machine!" :-)

But I think the dumbness of Java -- the funky GC, the dumb dumb dumb RTTI, will always make it slow. For that matter, Java uses a dumb fake-o instruction set that's not a particularly good representation of the information needed by a compiler. Why not give your JITC something closer to the source rather than a binary for a machine that doesn't even exist ?!?

...

Ditzel said something very significant when he introduced Crusoe: (i'm paraphrasing) "The reason we haven't talked openly about this before is we didn't want to boast before we could prove we've done it". Ditzel stood there and said "We've got a chip that does on-chip JIT architecture translation, and does it to run as fast as any other chip out there, and here's the proof".

The Java people have been claiming for a long time that they could write faster code than the C compiler with JITC.

Java has gotten better than when it started, it's true.

But it still can't hold a candle to gcc -O2.

My message to the Java world: I spend too much on my computers to waste time running slow code. Call me back when you can actually demonstrate your claims.

Re:Speed vs. Convenience by arivanov · 2000-01-26 22:21 · Score: 2

Doesn't "fast enough" outweigh "fastest possible" if you get it sooner?.

Yes it does but only if this is something you want to run not very often. If it is running constantly - think kernel, daemons, etc it does not outweight a thing.

Also perl or python still beat the sh... out of Java on anything and develkoping in either one of those is much faster. What I actually want to see finally is a perl.so or python.so plugin that loads perl (or python perse) as an application and does policing on all dangerous IO functions.

After all browser functionality is just an elementary module. Than the entire jabba the hut issue will be finally put where it belongs - to rest.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/

Re:Isn't this REALLY the smarter compiler arguemen by MikeBabcock · 2000-01-26 22:21 · Score: 2

One of the areas that run-time profiling adds to the mix is the ability of the code to adjust "itself" based on processor cache sizes, etc. If I run a program on my K6-200 and then on a K7-700, the program should have loops optimised very differently. That is, my chip has very little cache in comparison to the K7. The K7 could have a large number of long instruction sequences unrolled that would be in L1 cache whereas it would be smarter, if branch prediction works out (again, run-time profiling) to keep those loops rolled up on my CPU.

--
- Michael T. Babcock (Yes, I blog)

Re:within X% of the performance of native code by MikeBabcock · 2000-01-26 22:23 · Score: 2

Which reminds me of how big a smile I got when I read Transmeta's processor marketing where they said that efficiency was more important to them than performance and it would not be as fast as a native CPU, but close enough to cut it.

--
- Michael T. Babcock (Yes, I blog)

Java specifics by Tal+Cohen · 2000-01-26 22:24 · Score: 2

I don't want to argue about the general case, but as things relate to current-technology Java, JVMs and JITs, here are some things to consider:

a. One of the most important optimization techniques is inlining. In OO, you can't inline unless you're sure a call is monomorphic (static). Java is even worse than the general case OO -- a Java program simply cannot use inlining in the vast majority of cases. Here's why: To inline a call in OOP, you must know for 100% that the distatch site is monomorphic (i.e. that the call is never polymorphic). But in Java, even if the compiler finds out that a certain method is never replaced by inheritance (overriden) in any class in the program, the program could load new classes at runtime (using Java's Reflection mechanism), thus turning monomorphic call sites into potentially polymorphic ones.

(A few exceptions to this rule: first, there are final methods, or methods in final classes, that cannot be overridden. Second, there are Sealed JAR files, in which all package-level (default access level) methods cannot be overriden. IBM's JIT takes advantage of that, a fact which allows for great optimizations in the standard Java libraries.)

b. Sun's JVM (the one for Windows, at least) implements references by a dual layer: the value stored in the reference fields is an index into an array maintained by the JVM, which contains the actual pointer to the memory block. This allows the memory manager to move blocks around without having to update all references to that block, but it also means each dereference actually requires two dereference operations. So when you write "this.field1.field2", you think you're making two dereferences but actually the machine makes four.

c. The JVM is a stack machine. This means that naive implementations on register machines result in slow operations, that simply do not take proper advantage of the native architecture even when compiled to native code. Microsoft was the first to realize this, and indeed their JVMs were the fastest at the time (later on IBM's JVM took that trophy).

Finally, a word or two on the general case: anything a JIT can do can also be done by a regular compiler, if you allow it to use profiling data from a few "typical" runs of the program (but note that not all programs have "typical" runs). The JIT, however, consumes memory at runtime, unlike a regular compiler - and with programs that require more memory than is physically available (practically all programs today), extra memory requirements reduce speed since they increase paging; so the JIT has its price.

--
- Tal Cohen

java JIT itself is not that slow. by monac · 2000-01-26 22:26 · Score: 4

As shown at http://www.idiom.com/~zilla/Computer/javaCbenchmar k.html, even the linux jdk, which is relatively slow than other platform's, runs as fast as natively compiled C program when runs simple operations.

In my understanding, java's poor performance is due to following factors:

1. too generized api design resulted in very deep call stack. for example, printing current date and time using java.text.DateFormat uses much more calls than traditional C (a system call and a printf is enough for C). It gets more terrible in swing. use of interfaces causes the same problem i think.

2. JNI is too slow. (can't understand why but it is known to be slow and java can't help but using many JNIs)

3. as WORA is important in java, it is hard for java to use any platform specific resources such as Graphic card's accelerations. that is one of the reasons swing's design is getting ugly as it tries to boost its performance.

4. too many small class files result in high I/O usage when JVM loads classes.

5. no MACRO! for example, most of java API is synchronized, which is expensive, reguardless thread is used or not. the same for security check codes and platform specific implementation codes,
etc. these problems can be solved by MACRO but...
someone in sun doesn't like macro and i agree with him when think of java's purity.

some of these problems are known to be solved in next jdk version(hotspot client version). slow synchronizations and class loading speed, etc.
but mostly, they are java's design issue and never be solved, as java is mostly running in virtual machine and originally it is designed to be used for java chips, only when JVM is implemented in processor level, the speed problem will be solved.

I'm looking forward to seeing java running in MAJC or Transmeta's chips.

please correct me if i'm wrong in some and forgive me for my poor english.
Cheers.

--
-- Y. J. Chun

Ooo, ooo, start on packing! by TheDullBlade · 2000-01-26 22:47 · Score: 2

Seriously, when you communicate over a inter-machine connection, you have to comply with the communication standard.

If the communciation protocol is, for example, byte based and you are writing portable code, you have to have a formal, portable method of chopping up that struct into bytes at the sending end and building it back up again (from bytes) at the receiving end. It's called serializing, and it's disgusting laziness to skip it; if you do, you deserve those problems, and also to get bleeding genital warts.

Just dumping the bits of a fuzzily-defined internal data structure outside of the comfy innards of your own program is the ultimate in programmer carelessness (well, dividing by a null pointer and writing to the dereferenced quotient, with appropriate casts, is the ultimate in carelessness, but you get the point). Would you also send your pointers across a network and expect them to work at the other end?

--
/.

The "C is not portable" myth by Bj�rn+Stenberg · 2000-01-26 22:51 · Score: 2

> You cannot write portable programs if your basic data types are not defined.

This has to be one of the most widespread, and possibly destructive, myths about C programming. The truth is, if you write code that depends on the size of your basic types, your code will be less portable. The next generation machines will always handle your types in ways you will fail to predict.

> And it also doesnt give todays hardware any advantage.

Have you ever worked with a PowerPC processor? Some of these cores do not have 16-bit integers, only 32-bits. Yet, lots of people reasoning like you use them anyway (because "16 bits is enough for this value"). This forces the compiler to insert 16-bit emulation code, thus increasing code size and decreasing execution speed. Had these people ignored the size of their integers and just used 'int', the code would have been both faster and more portable!

Take a look at a few of the most widely ported software packages around. How many of these redefine the types in order to "know the size of their types"? Not many, I can tell you.

Writing portable code has little to do with worrying about the size of your integers, and lots to do with not trying to optimize your code to whatever system you happen to write the code on (by, for example, worrying about what the size of your integers are...)

Its the libraries stupid by joss · 2000-01-26 22:52 · Score: 2

The difference between theory and practice is that in theory there is no difference whilst in practice there is.

First off JIT is not java specific. Transmetas chip does a form of JIT on everything - even x86 assembly code. Your basic argument that JIT can be efficient is correct (not more efficient than humans, but I'll come to that).

The performance hit from Java comes far more from it's crappy libraries (Swing anybody) than the JIT aspects. Bounds checking, garbage collection, and inherent overhead of OO techniques all cause a performance hit, but the real trouble comes from crappy libraries written by crappy programmers who put 10 levels of indirection between a request and it's implementation.

For instance the String class being written with 16 bit characters to make it unicode friendly when 98% of the java code written never uses this feature but still requires twice the time and space for every operation. Also Strings are not alterable so to alter a character you create a new StringBuffer object, copy everything, make an adjustment, then replace the old reference with the new so the old object eventually gets passed to garbage collection... all compared to
buf[i] = 'a'; (C or C++). It's crap like this that makes Java seem so slow, not the JIT hit.

Your argument about how JIT can outperform programmers because it has more information is flawed - any decent programmer will know where the spikes are and optimize accordingly. He will know what he can and can't do with the data, whilst the compiler has to take the strictest possible interpretation. Advanced optimization is too slow to do on the fly (although you can optimize the byte code to some extent with a decent javac, then optimize the translation with a JIT)

Garbage collection argument is silly too. In C/C++ or assembly, the programmer has a choice as to whether he uses the stack or the heap. It makes his life more complicated, but as long as he knows what he's doing it's inherently more efficient. If he doesn't know what he's doing then he's useless in any langauge. If you want to, you can use a garbage collector in assembly or C++, but it's not the only technique available to you.

Basically Java is slow for the same reason that Windows sys admin is slower than Linux sys admin.
You don't trust the user with the rm command or trust the programmer with pointers - show him an idiot box if he tries to delete a file or bounds check that array access for him. It's one way of doing things, but it sure isn't efficient.

--
http://rareformnewmedia.com/

that's what longs are for by TheDullBlade · 2000-01-26 22:29 · Score: 2

And if you need really big numbers you can always code your own BigInt structs, though you might want to write a little preprocessor to do that (or use C++ and redefine all the ops for it).

--
/.

Re:a point about malloc/free vs. garbage collectio by ucblockhead · 2000-01-26 22:55 · Score: 2

Or maybe it only bothers to do this when resources get tight. Perhaps another malloc comes along wanting the same size (roughly) as the thing just freed. We just give it the recently freed chunk. We then don't have to build a new one.

Caveat: I have no idea if this makes sense, performance-wise, in the real world.

--
The cake is a pie

C is not for pansies by TheDullBlade · 2000-01-26 22:59 · Score: 2

C is for efficient coding by trained experts, with the potential for portability if that is a desired goal. It provides no hand-holding that would interfere with run-time efficiency. The default behavior is and should be the most efficient behavior.

There are plenty of other languages for people who are willing to sacrifice efficiency for more hand-holding.

--
/.

GC by joss · 2000-01-27 00:21 · Score: 2

> it's particularly nice in large and sophisticated object models where you almost always lose track of who has a reference to what

Does this make anyone else nervous as heck. If you can't even keep track of when your objects should be deleted, how the hell do you keep track of anything. Increasing complexity requires more clarity of thought and design, not less. It's not the overhead of GC that bothers me, it's the sloppiness of thought it seems to encourage (try using JBuilder with less than 256Meg if you want to see what I mean).

In some languages (eg ML), you really are working at a different level of abstraction and then GC makes perfect sense, but it's not GC explicity that makes life easier.

--
http://rareformnewmedia.com/

Re:What about the JIT? by Sketch · 2000-01-27 00:29 · Score: 2

"I still don't like the idea that they're keeping the instruction sets closed. It would seem like if someone out there wanted to port GCC to Crusoes native instructions, that would be good... But they just don't want to be percieved as being at all incompatible with Intel, i guees.

The reason has nothing to do with Intel compatibility, it has to do with the nature of VLIW. As someone else mentioned above, every new VLIW CPU from Transmeta will be different (the two CPUs they announced have different cores) and require totally different optimizing, possibly even having a different instruction set.

What Transmeta has done is remove the primary problem of VLIW...the fact that you need a very specific compiler for your CPU to extract performance out of it. By including the "compiler" in the CPU, it's done for you at run time, so you won't have to recompile your code every time you get a new CPU.

OpenVerse Visual Chat: http://openverse.org

--
-- OpenVerse Visual Chat: http://openverse.com

Re:within X% of the performance of native code by spinkham · 2000-01-26 23:04 · Score: 2

Well, Crusoe was designed for low power use, then they tried to get speed out of it..
"Efficiency"=Power efficiency, not speed efficiency.
It is slower then other x86 chips per mhz, but paster per watt. Therefore, they have done a good job at what they wanted to do. No, it's not an alpha or a k7, but it wasn't designed to be.

--
Blessed are the pessimists, for they have made backups.

Slashdot Mirror

Transmeta Code Morphing != Just In Time

129 of 454 comments (clear)