Good points about the film industry. But I still maintain that the recording studios often DO have good original source material available. In any event, even if the studios only own a few, they still own them and would not release them if the (almost certain) losses were not covered by the promotion of mainstream artists.
You greatly underestimate the importance of reissues in financing new projects. A reissue is lucky if it breaks even. The market for reissues is a tiny fraction of the market for new albums. How many reissues go gold? Ah! The ones that include never-before-released material, whose copyright starts when they are first published.
And what's the problem here? The studios made the recording. What's wrong with publishing it and holding the copyright? That's what copyright is for.
The almost limitless extension to copyright that we've seen is another issue entirely. I agree that this sort of thing is quite harmful. What I'm arguing is that the big labels do indeed provide a valuable service and that that service is dependent upon mass marketing of "mediocre" pop.
Besides, if some collector were to put out a fantastic release of a public domain Ellington work, what's to stop the record companies from taking that version, adding their own unreleased material, and selling their own compliation?
None. That's what public domain means. I don't see a problem here. Can the collector opyright the release if sufficient sound processing has been done to create added value?
If we didn't have functionally perpetual copyright, we wouldn't have to depend on the major labels for reissues of the Ellingtons and Armstrongs.
But would we have it in the same quality that we do now? How many people can afford complex sound filtering hardware and software and assemble a collection of records produced in the 20's and 30's? I've bought reissue discs from France and reissue discs from RCA/BMG. Guess which sound better? The labels also have access to the originally cut parts, which makes an enormous difference.
And there's still the point of the mainstream artists offsetting the costs of introducing new artists who may or may not become wildly popular.
You forgot one aspect of the labels that's very important:
9) The label leverages the money earned from mainstream artists to support niche and break-in markets.
If we didn't have the Metallicas, New Kids and Milli Vanillis, we wouldn't have the reissues of the Ellingtons and Armstrongs. We wouldn't have bebop, hard bop or acid jazz. We wouldn't have Ska or Grunge either. We wouldn't have the up-and-coming artists who might become the next mainsteam.
It's a very subtle point that I think a lot of people miss.
Actually, the spiral-bound "New Real Books" are published by Sher Music, publishers of the original (illegal) Real Books. In my experience, the changes are more correct than in the illegal versions, though there are still plenty of mistakes (note: I'm much more into traditional standard swing than bebop, etc.). And I much prefer the calligraphy of the New Real Book over the old Real Book.
I've never found a set of more correct Fake Books than the New Real Book series.
I just picked up the Standards Real Book last month. Terriffic stuff!
Well, then, you'd better make sure your bank has a full disclosure agreement with IBM, because they've been doing this sort of thing for years. Each instruction on the S/390 can trap to microcode and run a software emulation layer. Alpha does the same thing to implement the VAX ISA.
And again, there is no point in compiling to the Crusoe bare metal. End of story.
I agree that it would be neat to hack up the code morpher. But then, there is nothing preventing anyone from designing a similar system from scratch. If Compaq can do it, why not OSS? I don't think there's enough demand, though.
Actually, it's debatable whether a frontend from anything other than the x86 would realize any significant performance gains from code morphing.
x86 gets a big boost because of register allocation. All that ugly spill code produced because of the register-limited architecture gets translated into register moves (or eliminated entirely). Unless code for a PPC, MIPS etc. was compiled assuming a lot more than 32 register (which it can't, due to the instruction format), there won't be much gain here.
Unless Transmeta adds partial evaluation/specialization a la DyC or Tempo, I don't think the benefit of code morphing on, say, a PPC will overcome the cost. I suppose they could do something like Dynamo and implement a software trace cache.
One interesting avenue of research would be to compile to a virtual ISA that included lots of registers and other fancy hardware structures the compiler could use. Taking advantage of new compiler innovations would then be a matter of designing a new ISA and writing the code morpher. Not having to re-do the silicon would be a big savings.
Speaking of which, has Mozilla every gotten around to creating a sane installer for bonsai/tinderbox/bugzilla/etc.? Last time I checked, installation required lots of code hacking. I have enough hacking to do on tools to debug other tools without fixing tools to manage the hacking of other tools:)
I of course use CVS in the traditional manner for keeping one large project up-to-date. Lately I have begun experimenting with keeping smaller bits of project in separate modules and using cvs co to construct large projects from small modules.
My CVS repository is somewhat like a source code library in that sense. A project is then simply a collection of CVS modules. Rather than building a binary library full of unrelated code of which only a small subset is used in any one project, I just check out those parts I want for aproject.
I even have "maketools" and "textools" modules for checking out a build and documentation environment for each project.
I'm know lots of projects are organized this way on the source level (i.e. Cygnus has common code directories used across projects), but I wonder if anyone else has tried to create these common code directories from a single repository, rather than copying the code into multiple repositories. At the very least, make becomes really fun!:)
1. Register selection is very easy in most cases. Also, I can't think of a machine that has 10 registers, and anyway the NP-Complete problems require N! time.
You're right, mostly. Register allocation is easy in the sense in that graph coloring is fairly fast and is able to roll up allocation, machine idioms and simple copy propagation all in one. It can also do things like web/live range splitting to pack even more variables into one register. Compiler register allocators are pretty damn smart. I doubt a programmer could do better. I didn't follow your "10 registers" comment - what are you getting at?
2. But that's beside the point, because on a machine with a decent number of registers (anything but x86), you won't run out during a routine, or if you do, the scheduling will be fairly easy.
This is a widespread misconception about RISC machines in general. 32 registers is really not enough once you start optimizing heavily. Optimizations tend to increase variable lifetimes, sometimes quite dramatically. When the compiler combines partial-redundancy elimiation, loop unrolling and register promotion, it quickly saturates a register file of size 32. This doesn't even count FP optimizations. There's a reason Itanium has lots of registers (some of that has to do with EPIC's fundamental nature, of course).
3. The programmer will always at least tie any optimizing compiler. Why? Because the programmer uses the compiler. When I last coded in assembly, I was writing a graphical application. 90% of it was in C. I wrote 100% of it in C, looked for the hot spots, saw what the compiler had done to them, and improved it.
How did you improve it? I don't think a programmer can always tie or beat a highly optimizing compiler. There are too many transformations hidden by the syntax of the high-level language. One might think that the hot spots can be coded in assembly to recover that, but it's very, very difficult to optimize for the pipeline, cache, etc. Not to mention non-portable.
4. How did I improve it? I used SIMD instructions (MMX). I was able to get a 4x increase in speed. *NO COMPILER COULD EVER DO THIS*. There are 2 reasons for this:
This has been addressed by another poster. Vectorizing compilers have existed for a long time.
a. C doesn't have the syntax to do pixel manipulation SIMD style. My C code was full of shifts and bitmasks (ands). My assembly (when I re-wrote it) had only 2 shifts (one per section of the unrolled loop), and no bitmasks.
I'll grant that for some applications, especially DSP-type stuff, compiler technology really lags behind. But in general, I think most programmers would find it difficult to beat a good compiler optimizer.
b. A compiler doesn't know what the limits are on data that you'll pass to your function. You do. If you know all data will have an even number of pixels (say), you have an advantage over the compiler, which doesn't know this.
Now this is a valid point. The programmer should concentrate on the high-level algorithm, which should take stuff like this into account.
Oh, I don't think anyone will argue that. It's always been my belief that more programmers need to understand what's going on under the hood. In my ideal world, every CS and CompE degree would require an advanced computer architecture course (covering pipelining, caching, speculation, etc.), a (maybe advanced) compiler course (covering memory/stack layout, function call conventions/linkage and ABI issues, as well as dataflow analysis and code transformation) and an OS course to tie them all together.
It's been a small dream of mine to put together a 3-course cycle from computer architecture->compilers->OS and have the students build a complete system from scratch. I think students gain the most insight when they see how all the different pieces they learn about fit together and work in a (maybe) harmonious fashion.
Always is a pretty strong term. How many programmers do you know that optimize large applications for code layout, data cache layout and the processor's pipeline structure and branch predictor? How many programmers are even able to do these things? "Clever with assembly" means much more than instruction selection.
And I did ask 'why vectors and not arrays?' I got a (very nice) personal explanation from Steven Heller, who is very professional and open for discussion on any subject in the book;
When you say you got an answer, was that a correspondence with Heller, or was it in the book? I guess my beef was more with the review than the book per se. It's not clear from the review what the book covers and what is left out. The table of contents doesn't help, either. There is certainly a place for an "object-based C" book, as long as it's clear that the full language is not covered and it includes references to well-respected books with more detail.
Nope. If one truly wants the most out of their programs, they will turn on the compiler's optimizer.
Believe me, the compiler can do many more low-level machine-idiom transformations than most programmers can. Programmers should concern themselves with the high-level algorithm and leave CSE's and loop transformations to the compiler. The programmer usually gets in the way with fancy "optimizations."
--
Re:Can someone give 1 good reason to use C++ over
on
Who's Afraid Of C++?
·
· Score: 1
Maybe I got trolled, I dunno, but I see comments like this so often that I just have to respond.
Why use C++ over C? Ignoring the fact that C++ is for all practical purposes a superset of C, I'll list the featured I find most useful, anf the reasons why:
(Mostly) Strong Typing - A lot of people find this restrictive, but in the long run, proper use of datatypes, enums and consts saves debugging time. I find that I get more things "right the first time" in C++.
Encapsulaton - Properly protecting your data with private and protected can tell you when you've coded a bad algorithm. It forces you to separate the interface from the implementation. Multiple inheritance can take this one step further, but I don't usually go that route.
Polymorphism - Ok, this is the obvious one, but it really is extremely useful. It's also the hardest part of C++ for C programmers to "get," as it requires a completely different mode of thinking. I know it took me a couple of years and reading of Design Patterns to really, truly understand it. The Visitor pattern was my gateway to understanding.:)
Streams - This really goes along with strong typing and includes things like strings and other fundamental library issues. Streaming is a MUCH nicer way of formatting data than printf, mostly because it's well-typed. No more segfaults from passing an int to a %s format.
Libraries - While C certainly has excellent libraries, C++ includes the STL in the standard, which is a godsend (mostly). I don't want to code hashtables and lists. I did that in college. STL lets me get my work done in a standard way. Now, I have issues with how the STL was standardized (the STL string/iterator/streams interaction really, really bugs me), but overall it is very slick.
Finally, I'd like to address the issue of bloat. C++ does make it much easier to code badly. All of the abstract data types and code hiding can easily turn an O(n) algorithm into an O(n^2) one. As with any language, proper understanding of the code (libraries) is the key. The STL Programmer's Guide is an excellent resource for understanding the limitations and proper use of the STL.
To conclude with a "real world" example, I am currently on a team developing an optimizing compiler in C++. It's been a huge learning process, as any student project is, given that we started out with little compiler experience and only marginally more C++ experience. But throughout the project I have continually improved things by learning just a bit more about how C++ and the STL work. At this point, our compiler has similar functionality to gcc and runs in the same or less memory space. It's quite a bit slower, but I attribute that more to some non-optimal algorithms and more complex dataflow analysis than to C++.
In addition, by using C++'s ability to overload functions, I was able to quickly hack up the LeakTracer tool (which overloads operator new and operator delete), providing many memory debugging features and in the process reducing our memory consumption significantly. All in the span of a week.
More technical readers already familiar with programming and at least one C-based language might find the pace slow and the extra explanations unnecessary. Heller's target audience is definitely the neophyte, not the experienced developer. The latter might question the subject matter covered. Why build a vector class instead of using C-style arrays? Why not C-style strings? I suspect the author is more concerned with helping his students avoid the kind of pitfalls C++ was designed to work around. It may not be the traditional approach, but it's valid and it will produce decent programmers, who can learn C++ on its own merits.
I don't understand this comment. Questions such as, "Why build a vector class instead of using C-style arrays?" and, "Why not C-style strings?" are fundamental C++ questions. Now, it's true that the neophyte will not immediately question why a vector class is used, but IMHO it is an issue that should be addressed.
Additionally, the comment that readers can "learn C+ on its own merits" leads me to think that the book doesn't really cover C++, but rather covers "object-based C." Does the book even get into polymorphism?
Good supplements to this book (and essential reading for every C++ programmer, IMHO) would be the C++ FAQ Lite and STL Programmer's Guide. The Design Patterns book by Gamma, et. al. is also essential reading, but is probably a little advanced for the newbie.
Especially if the programmer messes things up by trying to hand-optimize!
Having the programmer use goto statements and hand-unroll loops usually makes things worse. It is far, far better to have the programmer concentrate on the high-level algorithm design.
Small-scale hand-optimization will do squat if your algorithm is exponential.
The compiler can do a whole lot if the programmer lets it do its job. That includes making careful use of C++ inlinig and templates!
Of course they will! Both things will happen. People will use it for MP3's and Sony will call it a computer. After all, WebTV is pretty popular.
Sony's plan is to make PS2 the command console for your entire home network. That means all your A/V stuff, appliances, etc. will go through PS2. This thing is much, much more than a gaming console.
Intel is scared of PS2. I got this from a friend on the inside, and I don't think he's making this stuff up.
--
Re:I think there's a simple technical solution
on
'Battling Censorware'
·
· Score: 1
Sort of like Apple's old HyperCard HyperScript?
"get the answer; put it into the variable," indeed.:)
I would argue though that NT isn't really derivative of CP/M; it more represents its own chain. Sub-operating systems like Java might eventually grow into their own systems.
NT is much more a derivative of VMS than CP/M. Did you never get the joke?
Compilers do perform dataflow analysis. They must do this to prove the correctness of transformations.
The limitation is that the dataflow analysis is not completely precise. This is because the compiler only has static information. It must assume both paths after a branch are taken, for example. When a join point is reached, the compiler must conservatively resolve the dataflow information from the separate paths to the join point. So for example if variable a is defined in both paths, the compiler must assume both definitions reach the join point.
Dynamo improves on this by forming dynamic traces. Now the compiler can optimize exactly the path through the program taken at run-time. Essentially, there are no join points, so there is no need to combine dataflow information from multiple paths. The trade-off is between constructing long traces and being able to reuse them frequently enough to recover the penalty of constructing and optimizing them.
Yes, you are. HP has some of the best compiler people around.
The thing about static compilers is that they have no idea what happens at run-time. Profiling has been used to mitigate this somewhat, but it's still a huge problem.
Accesses through memory are slow, so you want to get rid of them. One way to do this is through register allocation. Unfortunately, even if an infinite number of registers was available, you couldn't allocate everything to registers.
Why? Because we use pointers. There are multiple names for the same data running around in our little electronic brain. When you allocate something to a register, you bind it to one name. This is by definition incorrect for aliased data (data with multiple names).
Optimizations like register promotion try to get around this by allocating things in regions where the compiler can prove it only has one name. But this is exceedingly difficult when you have things like function calls which must be assumed to access and modify lots of data.
I won't even get into the problem of static instruction scheduling or other optimizations like partial redundancy elimination.
In short, aliasing through memory is nearly impossible to track accurately at static compile time. At run-time, the machine knows exactly which memory accesses reference which data, so things like run-time register allocation can do a better job. Crusoe does this to a limited extent.
Dynamo is essentially a software trace cache. Except that when forming the trace, it does transformations like Common Subexpression Elimination and other traditional compiler manipulations.
IBM has the Daisy project, which does code morphing from PPC to a VLIW ISA. I believe it also does some run-time optimizations. Projects like DyC and Tempo have been compiling at run-time for a while now.
I like to think of dynamic compilation in terms of the stock market. Which would you rather do: trade stocks with only limited information about their past behavior (and sometimes none at all), or trade stocks after having observed the absolutely most recent trends? I'll bet that if you pick the first strategy and I pick the second, I'll beat you every time.
That said, there are tricks ou can pull with static compilation. IA64 has the ALAT, which lets the machine track when store addresses match load addresses. This lets the static compiler speculatively move a load ahead of the store. If the store conflicts, the machine will execute some compiler-provided code to fix up the error. Essentially, the compiler is making an assumption that the load and store do not reference the same data and is communicating that assumption to the machine. The machine checks the assumption and invokes some fixup code if it proves to be incorrect.
What irks me is that every time something doesn't go right with Linux, it's Microsoft's fault. It's never the fault of NVIDIA, Matrox or Diamond. Because we all know how willing hardware manufacturers are to release their specs.
This is getting ridiculous. I don't care about Microsoft one way or the other. Some of their software works great for me. Some doesn't. The solution? Run vmware in Linux (I've got a very interesting and humorous story about an NT 4.0 install and attempted upgrade that should be good for a few laughs).
And what's the problem here? The studios made the recording. What's wrong with publishing it and holding the copyright? That's what copyright is for.
The almost limitless extension to copyright that we've seen is another issue entirely. I agree that this sort of thing is quite harmful. What I'm arguing is that the big labels do indeed provide a valuable service and that that service is dependent upon mass marketing of "mediocre" pop.
None. That's what public domain means. I don't see a problem here. Can the collector opyright the release if sufficient sound processing has been done to create added value?
--
But would we have it in the same quality that we do now? How many people can afford complex sound filtering hardware and software and assemble a collection of records produced in the 20's and 30's? I've bought reissue discs from France and reissue discs from RCA/BMG. Guess which sound better? The labels also have access to the originally cut parts, which makes an enormous difference.
And there's still the point of the mainstream artists offsetting the costs of introducing new artists who may or may not become wildly popular.
--
9) The label leverages the money earned from mainstream artists to support niche and break-in markets.
If we didn't have the Metallicas, New Kids and Milli Vanillis, we wouldn't have the reissues of the Ellingtons and Armstrongs. We wouldn't have bebop, hard bop or acid jazz. We wouldn't have Ska or Grunge either. We wouldn't have the up-and-coming artists who might become the next mainsteam.
It's a very subtle point that I think a lot of people miss.
--
I've never found a set of more correct Fake Books than the New Real Book series.
I just picked up the Standards Real Book last month. Terriffic stuff!
--
And again, there is no point in compiling to the Crusoe bare metal. End of story.
I agree that it would be neat to hack up the code morpher. But then, there is nothing preventing anyone from designing a similar system from scratch. If Compaq can do it, why not OSS? I don't think there's enough demand, though.
--
x86 gets a big boost because of register allocation. All that ugly spill code produced because of the register-limited architecture gets translated into register moves (or eliminated entirely). Unless code for a PPC, MIPS etc. was compiled assuming a lot more than 32 register (which it can't, due to the instruction format), there won't be much gain here.
Unless Transmeta adds partial evaluation/specialization a la DyC or Tempo, I don't think the benefit of code morphing on, say, a PPC will overcome the cost. I suppose they could do something like Dynamo and implement a software trace cache.
One interesting avenue of research would be to compile to a virtual ISA that included lots of registers and other fancy hardware structures the compiler could use. Taking advantage of new compiler innovations would then be a matter of designing a new ISA and writing the code morpher. Not having to re-do the silicon would be a big savings.
--
--
My CVS repository is somewhat like a source code library in that sense. A project is then simply a collection of CVS modules. Rather than building a binary library full of unrelated code of which only a small subset is used in any one project, I just check out those parts I want for aproject.
I even have "maketools" and "textools" modules for checking out a build and documentation environment for each project.
I'm know lots of projects are organized this way on the source level (i.e. Cygnus has common code directories used across projects), but I wonder if anyone else has tried to create these common code directories from a single repository, rather than copying the code into multiple repositories. At the very least, make becomes really fun! :)
--
--
--
You're right, mostly. Register allocation is easy in the sense in that graph coloring is fairly fast and is able to roll up allocation, machine idioms and simple copy propagation all in one. It can also do things like web/live range splitting to pack even more variables into one register. Compiler register allocators are pretty damn smart. I doubt a programmer could do better. I didn't follow your "10 registers" comment - what are you getting at?
This is a widespread misconception about RISC machines in general. 32 registers is really not enough once you start optimizing heavily. Optimizations tend to increase variable lifetimes, sometimes quite dramatically. When the compiler combines partial-redundancy elimiation, loop unrolling and register promotion, it quickly saturates a register file of size 32. This doesn't even count FP optimizations. There's a reason Itanium has lots of registers (some of that has to do with EPIC's fundamental nature, of course).
How did you improve it? I don't think a programmer can always tie or beat a highly optimizing compiler. There are too many transformations hidden by the syntax of the high-level language. One might think that the hot spots can be coded in assembly to recover that, but it's very, very difficult to optimize for the pipeline, cache, etc. Not to mention non-portable.
This has been addressed by another poster. Vectorizing compilers have existed for a long time.I'll grant that for some applications, especially DSP-type stuff, compiler technology really lags behind. But in general, I think most programmers would find it difficult to beat a good compiler optimizer.
Now this is a valid point. The programmer should concentrate on the high-level algorithm, which should take stuff like this into account.
--
It's been a small dream of mine to put together a 3-course cycle from computer architecture->compilers->OS and have the students build a complete system from scratch. I think students gain the most insight when they see how all the different pieces they learn about fit together and work in a (maybe) harmonious fashion.
--
--
When you say you got an answer, was that a correspondence with Heller, or was it in the book? I guess my beef was more with the review than the book per se. It's not clear from the review what the book covers and what is left out. The table of contents doesn't help, either. There is certainly a place for an "object-based C" book, as long as it's clear that the full language is not covered and it includes references to well-respected books with more detail.
--
Believe me, the compiler can do many more low-level machine-idiom transformations than most programmers can. Programmers should concern themselves with the high-level algorithm and leave CSE's and loop transformations to the compiler. The programmer usually gets in the way with fancy "optimizations."
--
Why use C++ over C? Ignoring the fact that C++ is for all practical purposes a superset of C, I'll list the featured I find most useful, anf the reasons why:
Finally, I'd like to address the issue of bloat. C++ does make it much easier to code badly. All of the abstract data types and code hiding can easily turn an O(n) algorithm into an O(n^2) one. As with any language, proper understanding of the code (libraries) is the key. The STL Programmer's Guide is an excellent resource for understanding the limitations and proper use of the STL.
To conclude with a "real world" example, I am currently on a team developing an optimizing compiler in C++. It's been a huge learning process, as any student project is, given that we started out with little compiler experience and only marginally more C++ experience. But throughout the project I have continually improved things by learning just a bit more about how C++ and the STL work. At this point, our compiler has similar functionality to gcc and runs in the same or less memory space. It's quite a bit slower, but I attribute that more to some non-optimal algorithms and more complex dataflow analysis than to C++.
In addition, by using C++'s ability to overload functions, I was able to quickly hack up the LeakTracer tool (which overloads operator new and operator delete), providing many memory debugging features and in the process reducing our memory consumption significantly. All in the span of a week.
--
I don't understand this comment. Questions such as, "Why build a vector class instead of using C-style arrays?" and, "Why not C-style strings?" are fundamental C++ questions. Now, it's true that the neophyte will not immediately question why a vector class is used, but IMHO it is an issue that should be addressed.
Additionally, the comment that readers can "learn C+ on its own merits" leads me to think that the book doesn't really cover C++, but rather covers "object-based C." Does the book even get into polymorphism?
Good supplements to this book (and essential reading for every C++ programmer, IMHO) would be the C++ FAQ Lite and STL Programmer's Guide. The Design Patterns book by Gamma, et. al. is also essential reading, but is probably a little advanced for the newbie.
--
This is the single worst comment in the Linux kernel, and I don't even have to look at the rest of them to know it.
Fast code does not have to be messy. In C, goto should be used for very few things. One example is breaking out of a deeply nested loop.
In any event, the lack of registers on the x86 is really the main problem anyway.
As for profiling, it may be a lost art, but we do quite a bit of it here. It's helped us speed things up tremendously.
--
Especially if the programmer messes things up by trying to hand-optimize!
Having the programmer use goto statements and hand-unroll loops usually makes things worse. It is far, far better to have the programmer concentrate on the high-level algorithm design.
Small-scale hand-optimization will do squat if your algorithm is exponential.
The compiler can do a whole lot if the programmer lets it do its job. That includes making careful use of C++ inlinig and templates!
--
Sony's plan is to make PS2 the command console for your entire home network. That means all your A/V stuff, appliances, etc. will go through PS2. This thing is much, much more than a gaming console.
Intel is scared of PS2. I got this from a friend on the inside, and I don't think he's making this stuff up.
--
"get the answer; put it into the variable," indeed. :)
--
NT is much more a derivative of VMS than CP/M. Did you never get the joke?
HAL->IBM : VMS->WNT
--
The limitation is that the dataflow analysis is not completely precise. This is because the compiler only has static information. It must assume both paths after a branch are taken, for example. When a join point is reached, the compiler must conservatively resolve the dataflow information from the separate paths to the join point. So for example if variable a is defined in both paths, the compiler must assume both definitions reach the join point.
Dynamo improves on this by forming dynamic traces. Now the compiler can optimize exactly the path through the program taken at run-time. Essentially, there are no join points, so there is no need to combine dataflow information from multiple paths. The trade-off is between constructing long traces and being able to reuse them frequently enough to recover the penalty of constructing and optimizing them.
--
The thing about static compilers is that they have no idea what happens at run-time. Profiling has been used to mitigate this somewhat, but it's still a huge problem.
Accesses through memory are slow, so you want to get rid of them. One way to do this is through register allocation. Unfortunately, even if an infinite number of registers was available, you couldn't allocate everything to registers.
Why? Because we use pointers. There are multiple names for the same data running around in our little electronic brain. When you allocate something to a register, you bind it to one name. This is by definition incorrect for aliased data (data with multiple names).
Optimizations like register promotion try to get around this by allocating things in regions where the compiler can prove it only has one name. But this is exceedingly difficult when you have things like function calls which must be assumed to access and modify lots of data.
I won't even get into the problem of static instruction scheduling or other optimizations like partial redundancy elimination.
In short, aliasing through memory is nearly impossible to track accurately at static compile time. At run-time, the machine knows exactly which memory accesses reference which data, so things like run-time register allocation can do a better job. Crusoe does this to a limited extent.
Dynamo is essentially a software trace cache. Except that when forming the trace, it does transformations like Common Subexpression Elimination and other traditional compiler manipulations.
IBM has the Daisy project, which does code morphing from PPC to a VLIW ISA. I believe it also does some run-time optimizations. Projects like DyC and Tempo have been compiling at run-time for a while now.
I like to think of dynamic compilation in terms of the stock market. Which would you rather do: trade stocks with only limited information about their past behavior (and sometimes none at all), or trade stocks after having observed the absolutely most recent trends? I'll bet that if you pick the first strategy and I pick the second, I'll beat you every time.
That said, there are tricks ou can pull with static compilation. IA64 has the ALAT, which lets the machine track when store addresses match load addresses. This lets the static compiler speculatively move a load ahead of the store. If the store conflicts, the machine will execute some compiler-provided code to fix up the error. Essentially, the compiler is making an assumption that the load and store do not reference the same data and is communicating that assumption to the machine. The machine checks the assumption and invokes some fixup code if it proves to be incorrect.
--
What irks me is that every time something doesn't go right with Linux, it's Microsoft's fault. It's never the fault of NVIDIA, Matrox or Diamond. Because we all know how willing hardware manufacturers are to release their specs.
This is getting ridiculous. I don't care about Microsoft one way or the other. Some of their software works great for me. Some doesn't. The solution? Run vmware in Linux (I've got a very interesting and humorous story about an NT 4.0 install and attempted upgrade that should be good for a few laughs).
--