JVMs don't have intelligence to rearrange objects in the heap in a layout that favors cache locality. This happens in a limited extent:
1) GC continually compacts the heap, avoiding free space fragmentation. This results in denser cache lines, which means better cache usage. 2) Because a garbage-collected heap allows linear allocation - i.e., Java's "new " is basically as simple as "return (freePos += requestedSize)" - objects that are allocated together are typically grouped in one contiguous chunk of memory, while in C/C++ they could be scattered all over the heap.
The second factor can produce significant gains for Java, but only for very large and complex apps, with large heaps containing tons of objects (not the case of microbenchmarks that allocate everything at startup and on a clean heap). And even in this situation, modern C/C++ runtimes have significantly better heap managers than the naive, 50-line freelist malloc() algorithm used in the 70's. And when this fails, optimized C/C++ apps will resort to custom allocators.
The #1 reason that many microbenchmarks show Java beating C is that the top JIT compilers are extremely aggressive with three complimentary tactics:
1) Profile-driven compilation. This is trivial to implement in a dynamic compiler, but for static compilers it's cumbersome enough (requires extensive, up-to-date coverage tests for all performance-sensitive code), that 99% of all native programs don't use that feature even if the compiler supports it. 2) Deoptimization. The JIT can make heavy bets, e.g. "in this virtual call for a method of the Number type, the actual receiver is always a Double", and generate faster code. If this bet eventually goes bad (e.g. after a gazillion calls of the optimized code with Double arguments, it's called with a Integer), the JVM is able to efficiently trap this, fix the compiled code ("deoptimize" the now-invalid code), and go ahead. 3) Machine-specific optimization. A JVM will fine-tune all generated code for your specific system configuration, down to CPU stepping and precise cache hierarchy. Static-compiled code must typically be compatible with some reasonable configuration, e.g. "any Pentium or better". Even a fanatic Gentoo user that compiles everything for each machine is behind a JVM because the C/C++ compilers simply don't have sufficient -arch options to match JVMs.
The last item is another situation that may deliver cache-related benefits, but this is because JVMs are known to generate extremely cache-friendly code (reordering, prefetching instructions for arrays, etc.).
Let's not even begin to compare Microsoft's vs Sun's power to push stuff to the desktops of the masses, it's not even funny.
For a reality check, see in Jonathan Schwartz's blog how Microsoft bought, for an undisclosed but certainly huge pile of cash, the privilege of bundling some of their stuff (e.g. Windows Live toolbar) with Sun's Java updates.
I wonder if this means Sun is going to pull out of Orbit and come up with some J2ME version of JavaFX?
Java FX Mobile was also released (but still in beta stage; FCS planned for next spring). Check Terrence Barr's blog. In fact, the mobile version is a big part of JavaFX's grand scheme. Deploy the exact same code on desktop, web and mobile devices - it's revolutionary and unique, for anything as rich as JavaFX.
...he didn't pay much attention to standard values of novels; things like, say, human emotions, fast action, sex, or even much real suspense - the plot is usually "logical" and the real thrill of the reader is being taught the fine details that connect Point A to Point B. A lecturer-style, if you wish. In other literary aspects, like narrative structure and command of the English language, Asimov seems quite strong (I'm a non-native English speaker, having read most of his works translated, but as an adult [and professional bilingual writer] I've read a few originals - e.g. Gold - and liked it truly.) Many readers actually love that style in the genre of hard-SF. No literary decorations, no convoluted characters... just the fundamentals: GREAT ideas envolving future technology and its iteraction with society, and a competent and serious development of these ideas. Salvo exceptions like the Lucky Starr space-cowboy series; and even those books were much above the level of "entertainment sci-fi" like Flash Gordon.
The movie surprised me with how faithful it was to the dozens of Asimov robot stories. Let me repeat: Asimov's themes fill the movie from start to finish. The movie's plot is entirely based on Asimov's four (yes, four) Laws of Robotics.
I agree. It also carries several other wins, like Asimov's view of future supercomputers (paternalist mainframes); a quite good early Dr. Susan Calvin (a perfect one should be a much uglier actress but hey that's Hollywood - other aspects are all right); the fundamental theme of human values against the possibilities of AI (the plot with a robot saving Smith's live over a girl). I even liked the few insights of comedy, e.g. the scene with the robot being a super-fast cooking helper (but it's actually a serious theme: how easily people will "sell" traditional values for the convenience of new tech).
I'll give that movie an 8, which is not perfect but still in the top 1% percentile for movie adaptations of science fiction.
This migration won't work for systems that employ advanced JIT code generation, such as Java. Modern production JVMs, like Sun's and IBM's, will create native code on the fly - and they will produce code that's ultra tuned for the specific processor that is running. This means using the best instructions available (like SSEx), and also fine-tune various behaviors, e.g. GC can be tuned for the L1/L2 cache sizes, and locking can be tuned to factors like number of CPUs/cores/hardware threads - so for example, if it's running on a uniprocessor/single-core machine, the JVM will simply not emit memory barrier instructions for memory model consistency.
And it's not only Java, we have an increasing large number of JIT compilers that may employ similar tricks: Microsoft.NET (CLR); Flash 9+ (Tamarin) for ActionScript; Mozilla TraceMonkey and Google V8 for JavaScript; new LLVM-based runtimes for other languages... the list is only growing. Even for traditional static-compiled languages, some apps can have multiple shared libs compiler for different CPU levels, and choose the best lib at startup.
The only way I see around this problem is making ALL these runtimes and applications migration-aware. Each process should be notified before the migration, initiate some pre-migration task, and after the migration, being notified again to resume work and if necessary perform some post-migration step. Specifically for Java, the pre-migration would need to "park" all threads in OSR safepoints, then free all JIT-generated code; and in the after-migration, retune/config itself for the new CPU, then unpark the threads - that would resume execution in interpreted mode until the JIT compiler recreates all native code for the new CPU. Fortunately this is relatively simple to do in JVMs, because all necessary plumbing is preexisting (safepoints, on-stack replacement... required for advanced GC and dynamic optimizations). And once a new JVMs are enhanced with this feature, thousands of Java apps become magically migration-aware. Could be harder though for other runtimes.
Still, very hot technology, just not as easy as we can imagine to get right and compatible with all applications.
A small nit: Tracemonkey doesn't compile entire functions. It compiles instruction streams, which means that if your function has a branch that's never taken that part of the branch will never get compiled.
I know that (have read all papers about tracing JIT). But traces can also span multiple methods (it does the equivalent of inlining) - so in terms of size of compilation units, it's equivalent to traditional compilers.
There's a tradeoff here, though, which is that you want the compilation to be fast (as in, avoiding the Java "let's be slow while we start up and jit all this code" syndrome).
We'll digress here, but... the startup of Java, today, is not dominated by JIT; modern JVMs have very lazy and extremely fast "client-side" JITs (of course, not as strongly-optimizing as server-side ones). APIs like Swing deserve most of that blame, as well as the dynamic classloading model that forces too much initialization (for all static data) to happen at loading time... Finally, Java is difficult to compared to other systems because it's massively "meta-circular" (even before Java-in-Java VMs like JikesRVM and Maxine); the VM is native, but 99% of all Java APIs are written in Pure Java. So it's not fair comparing Java to JavaScript (or most dynamic languages), these only boot fast because their APIs are implemented in C.
The faster your compiler needs to be, the less room you have to fancy optimizations.
The compilation speed vs. optimizations tradeoff is indeed important, but it's a war compiler writers are winning every year. Sun's Java7 will have a hybrid JIT (client+server) to provide both fast JIT, and highly-optimizing JIT for the "really hot" spots. Tracing is the next revolution for systems that need VERY fast and lightweight JIT like JavaScript - because tracing basically takes some NP-hard compiler tasks, such as producing the SSA form, and converts them into polynomial tasks (simply because a Tree is a much nicer structure than a DAG). Both TraceMonkey and V8 benefit also from dynamic optimization (profile-driven and speculative), a la HotSpot. I don't see compilation time as a significant barrier for optimization in either VM - remember, they have only the app code to optimize; even high-end RIA apps are very "browser bound", most work is performed by the browser, that is a native and previously-booted component.
Constant folding, yes (actually happens on the interpreter level too).
Constant folding (and other opts) is more effective in the end of a full pipeline (or in several places)... at the interpreter level, you can just fold program-level constants, e.g. "var Pi=3.14;...; return 2*Pi*radius". But in a (real compiler's) LIR optimization phase, you can fold loads of compiler-induced variables, e.g. the resolved address of an array's zero-index element.
Instruction scheduling, maybe not so much. I agree with your main point that V8 and Tracemonkey can get faster on these benchmarks. I'm just not convinced they'll necessarily end up that much faster than SFX.
Well, I think this potential (to wipe the floor with SFX) is clearly there... like I said, V8 and TM are still missing a boatload of important optimizations, including many low-hanging fruits. They don't need to approach anywhere near say to HotSpot Server or gcc -O3; it's not a problem if they won't, ever, implement the really aggressive optimizations... even with only mid-range opts, they will become several times faster than today in many benchmarks. But SFX can't run that race too; it's already close (IMHO) to the max potential of its architecture. They can just keep adding special-case opts, things like their regex JIT which is a nice idea because JavaScript apps can rely on a lot of regex's... but you can't special-case everything.
These benchmark results are a bit debatable - I've seen different suites electing different "winners" and, while SunSpider seems to be the best, it's a long way from a robust benchmark like SPEC* or DaCapo.
In any event, even if SFX is leading the pack right now, that's because it's the most mature competitor, and its advantage won't last too long. I predict (and I write this logged with my account, not AC, so I would be forever glorified when this becomes true in 12 months max) that both V8 and TraceMonkey will take the lead, leaving SFX in a safe third place permanently.
The reason is very simple. All these new JS VMs are JIT compilers, producing native code. But SFX is a context threaded JIT. Context threading is just a step beyond traditional direct-threaded interpreters: functions are 'compiled' into streams of CALLs into routines that implement each bytecode operation, but there is limited inlining (simple operations and branches), with a focus on reducing branch misprediction.
OTOH, both V8 and TraceMonkey are "real compilers" that emit real native code (not CALL streams) for entire functions (or even larger chunks of code, with inlining). This is necessary to enable traditional optimizations like register allocation, instruction scheduling, constant folding, loop unrolling etc. Some of these optimizations can be performed on a high-level intermediate code representation (HIR), but that's typically not worth the effort without real compilation. E.g., loop unrolling will just waste memory an i-cache efficiency if performed by a threaded interpreter/JIT... as the real benefit of unrolling is giving the compiler a much larger basic block to perform other opts like extra folding and bounds-check elimination, or real low-level tricks like exploring using SIMD registers and operations / Instruction-Level Parallelism / prefetching / branch predication etc.
The only reason why V8 and TraceMonkey don't completely 0wn the benchmarks today, is that these JITs are still in their infancy. They have implemented the foundations (like V8's hidden classes or TM's tracing), but they still miss to implement dozens of important optimizations (including very easy ones - they just didn't have the time yet). Check some comments about V8's limitations. TM's developers have also commented on many limitations, quote (Andreas Gal: "If it talks to the DOM during the benchmark, we currently donâ(TM)t compile across such calls (we plan to for Beta2 though)". This and several other improvements are planned for future builds of Firefox 3.1. Notice that items like special support for DOM interactions and event handlers should be critical to some benchmarks - and of course to real-world RIA apps. I'm sure the V8 hackers are also working around the clock to fill in their own gaps. When both VMs are reasonably mature, SFX will have a VERY hard time competing (unless of course, they abandon the context threading model and mutate into a real compiler). Other optimizations, like JITted regex, can be implemented in all VMs and will eventually be ubiquitous.
So when east meets west, both sides see the world in very different light. Except that we "see" them (and ourselves) in a much brighter light, since we have such amenities as free press, uncensored internet, and general freedom of speech.
That would only be true if the CPU is able to retire a sustained average of one instruction per clock cycle. SFGate's article makes a raw comparison between chips with different number of cores, threads and other factors, considering only GHz...
From the article: "Sinclair developed a scheme of assigning multiple BASIC keyword commands for each key, so users would have to press only one key (such as P for "PRINT") instead of typing out the entire command. Using the keyboard to type something that wasn't a BASIC command, however, turned out to be an exercise in frustration. Only masochists had any fun attempting word processing on the Timex Sinclair 1000."
I'm tired of this bashing of the Sinclar-family keyboards! Speaking as somebody who used one for over five years, I tell you that the multi-function keyboard was very efficient, at least for typing BASIC programs of course. Remember that all cheap 8-bit computers had to cut fabrication costs in items like cases, keyboards, power supplies etc; NONE of these machines had a decently built keyboard. With this economic constraint in mind, Sinclair solved two important problem: maximising typing speed for typical usage, and reducing wear-out of cheap keyboard components.
As for common text input, no problem because the ROM input routine was modal. The cursor would be toggled between several modes - it was a "K" for the main BASIC keywords or symbols, "E" for extended ones, "G" for graphics, "L" for letters, "C" for capitals and "?" to flag syntax errors in BASIC lines (an advanced feature, most machines would accept any input and only issue syntax error messages when you tried to run the program!). So you could type in any mode without continuous usage of SHIFT or other mode-changing keys. Another nicety was the embedded color-code input, made in "E" mode IIRC. Once you memorized the several functions assigned to each key, and got used to the modal system, you could type VERY fast compared to any other micro that also had low-wquality keyboards but required typing I,N,P,U,T,SPACE for INPUT and so on. (The Sinclair editor didn't require spaces; its BASIC pretty-printer inserted spaces as necessary... and these spaces didn't consume memory, like it happened with other micros, so people would resort to cryptic space-less coding like "FORX=0TO10:PRINTX:NEXTX", while Sinclair users very very porud that their BASIC listings were always readable with canonical spacing.)
P.S.: The model I used was a Brazilian TK-95, a Spectrum 48 clone that had a better keyboard, see photo and article here. This keyboard was among the best in this class of computers, I used it for 5+ years without any key stopping to work... even though I didn't program much in BASIC, in the end of the first year I was already hacking only in Z80 so I had to type stuff letter by letter. The keyboard was good enough for this and word processing - similar to the C64, but sans stupid layout problems. (I concede that the original rubber keyboard was bad for fast non-BASIC typing, like word processing.)
"Eclipse does some things really well -- taking advantage of being a Java-based editor, it can use RTTI to help in the code-writing process."
Not true. Eclipse (as well as NetBeans and ALL other Java IDEs) never uses RTTI (i.e. instanceof, reflection and other dynamic / runtime Java features). I does everything it does through parsing the source code. You know, in a decent language like Java (grammar of sane size and complexity, no preprocessor etc.), editors can perform a fast, but precise parsing of your entire projects in real time. And they even do that without monster-sized files - like "precompiled headers" our "source browsing information", common in C/C++ IDEs that struggle to implement a modest approximation of all smart-editing, refactoring and other features of Java IDEs.;-)
Another thing that bothers me: on Win2K and XP, after every batch of monthly hotfix updates I would delete the hidden Windows/KBxxxx directories that the hotfixes create with their uninstallers and backups of replaced files. I do that because I never, ever, remove any such updates, from small hotfixes to huge service packs (downgrading is for losers)... so, why wasting disk space with their uninstallers? Now on Vista I cannot do that anymore, because its Windows/* file organization is orders of magnitude larger, more complex and more obscure than ever before. I tried to identify which files and directories contain hotfix uninstallation data, but it's a mess, I'd take a lot of time to identify those files (I know how to do that, e.,g. with ProcMon logs) and I'd still fear breaking the whole OS if I remove or update any files manually.
Well, Vista is so big in the disk that wasting a few dozen Mb with unwanted uninstallers does not seem like a big deal anymore. But it IS a big deal when an OS's base file structure becomes so stupidly complex, that some body like me (Windows user since 3.0, ex-sysadmin, and skilled software developer) is too scared to touch it with his bare hands...
Complementing your list (also without google's help...)
Brooks, Frederick P.: The Mythical man Month, No Silver Bullet paper. Processualist.
Milner, Robin: From the Hindler-Millner typesystem used in ML and Haskell, and perhaps other functional languages too. Type inference stuff that gives you all the advantages of static typing without most of the type declaration work.
"1984 Wirth, Niklaus: You can call him by value; or you can call him by name." LOL!!
Codd, Edgar F.: Database pioneer, recently RIP.
Hoare, C. Antony R. : And one of the greatest conference speakers I've ever seen, a few years ago in ECCOP. Da man used a few old-style transparent sheets, with hand-drawing of the stuff he was talking about (a formalism/algebra to track pointers in ways that should be useful to compilers etc), superposing and moving several sheets together to make animations and special effects. Terribly humiliating for people like me who have to sweat hours on a colorful PowerPoint before making a good presentation.
Here: http://labs.google.com/papers/gfs.html. Abstract: "We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. (...)" Pretty damn cool stuff, very advanced but perhaps a bit too tuned for Google's needs. See also the papers on their own clustering technology and distributed programming framework (MapReduce), which go hand in hand with GFS. These are some of the technical secrets behind Google's search engine and other apps, it's funny how little you hear about them. For massive tasks like Google search, I don't think anybody can compete with standard technology, i.e. big machines (vertical scalability), off-the-shelf clusters and SQL servers, and conventional programming techniques.
Windows users still waiting for WinFS...
on
Sun Releases ZFS
·
· Score: 1
...now promised for some post-Vista date. I know both FS'es are not the same thing, but both are supposed to be revolutions in this field, and there are important points in common like transactional semantics and much easier administration of large volumes of data.
I tried the WinFS beta a couple months ago, and was horrified with the amount of RAM it sucks. I got close to 300Mb running only Windows and the WinFS daemons; without any application running, and without using any feature of WinFS. If the best innovation Microsoft can do is installing a full-blown instance of SQLServer disguised as a filesystem, and if ZFS does not do that (I didn't test it), then Sun is pretty comfortable in this particular competition.
Slashdot keeps posting articles about "Java killers" even when they are forced to rehash year-old products, articles and so-called benchmarks, that everybody have already seen (and dismissed) one year ago. Other recent example is Curl (which is new and will probably not succeed, but this will not stop Slashdot from posting dozens of "Curl kicks Java" articles in the following couple years until Curl fades into oblivion).
When it comes to performance, I will not mention the obvious all over again (Java not being slow today etc) but rather point that you can produce an impressive gfx demo even with GW-BASIC, if you have a good interface to OpenGL (or other native, hi-perf graphics library) simply because it's the native code (and better: dedicated HW) that will be doing virtually all the hard work. These demos are worth nothing, unless you can see that the demo app is doing a significant amount of work before piping data to the native engine. But if you are a believer, just fetch JDK1.4.0-beta2 and the JCanyon demo from JavaOne which totally humiliates the tiny "spinning teapot" demos of all "Java killers" I can find.
It is true that iexplore.exe processes are one without any open browser windows. But lots of MSIE support DLLs are always loaded by explorer.exe; look at its DLLs with NTHandleEx or other process inspector and you have some surprises...
7 being the first natural number which inverse produces a composite periodic sequence, thus making an infinite sequence of 42's possible, and there they are -- in the very first interesting rational number we can find!!
Will this version finally come with the NTFS driver prebuilt (read-only is OK if it's not yet robust), or will the large number of NT+Linux dual-booters be required to config & rebuild the kernel for this single reason, as in all distros?:-(
If and when law enforcement becomes efficient in the Internet, people will just start to use encryption MASSIVELY. Every single thing from IRC to Napster will run on top of very strong encryption, and what are the net-cops going to do then (considering that they've already lost the fight to prevent use/export of encryption software)? Have an entire farm of supercomputers to break every single surfer's packets? This is a very substantial difference from "real world" law enforcement, where people have no access to near-perfect means to do anything they want without leaving evidence.
JVMs don't have intelligence to rearrange objects in the heap in a layout that favors cache locality. This happens in a limited extent:
1) GC continually compacts the heap, avoiding free space fragmentation. This results in denser cache lines, which means better cache usage.
2) Because a garbage-collected heap allows linear allocation - i.e., Java's "new " is basically as simple as "return (freePos += requestedSize)" - objects that are allocated together are typically grouped in one contiguous chunk of memory, while in C/C++ they could be scattered all over the heap.
The second factor can produce significant gains for Java, but only for very large and complex apps, with large heaps containing tons of objects (not the case of microbenchmarks that allocate everything at startup and on a clean heap). And even in this situation, modern C/C++ runtimes have significantly better heap managers than the naive, 50-line freelist malloc() algorithm used in the 70's. And when this fails, optimized C/C++ apps will resort to custom allocators.
The #1 reason that many microbenchmarks show Java beating C is that the top JIT compilers are extremely aggressive with three complimentary tactics:
1) Profile-driven compilation. This is trivial to implement in a dynamic compiler, but for static compilers it's cumbersome enough (requires extensive, up-to-date coverage tests for all performance-sensitive code), that 99% of all native programs don't use that feature even if the compiler supports it.
2) Deoptimization. The JIT can make heavy bets, e.g. "in this virtual call for a method of the Number type, the actual receiver is always a Double", and generate faster code. If this bet eventually goes bad (e.g. after a gazillion calls of the optimized code with Double arguments, it's called with a Integer), the JVM is able to efficiently trap this, fix the compiled code ("deoptimize" the now-invalid code), and go ahead.
3) Machine-specific optimization. A JVM will fine-tune all generated code for your specific system configuration, down to CPU stepping and precise cache hierarchy. Static-compiled code must typically be compatible with some reasonable configuration, e.g. "any Pentium or better". Even a fanatic Gentoo user that compiles everything for each machine is behind a JVM because the C/C++ compilers simply don't have sufficient -arch options to match JVMs.
The last item is another situation that may deliver cache-related benefits, but this is because JVMs are known to generate extremely cache-friendly code (reordering, prefetching instructions for arrays, etc.).
Let's not even begin to compare Microsoft's vs Sun's power to push stuff to the desktops of the masses, it's not even funny.
For a reality check, see in Jonathan Schwartz's blog how Microsoft bought, for an undisclosed but certainly huge pile of cash, the privilege of bundling some of their stuff (e.g. Windows Live toolbar) with Sun's Java updates.
I wonder if this means Sun is going to pull out of Orbit and come up with some J2ME version of JavaFX?
Java FX Mobile was also released (but still in beta stage; FCS planned for next spring). Check Terrence Barr's blog. In fact, the mobile version is a big part of JavaFX's grand scheme. Deploy the exact same code on desktop, web and mobile devices - it's revolutionary and unique, for anything as rich as JavaFX.
...he didn't pay much attention to standard values of novels; things like, say, human emotions, fast action, sex, or even much real suspense - the plot is usually "logical" and the real thrill of the reader is being taught the fine details that connect Point A to Point B. A lecturer-style, if you wish. In other literary aspects, like narrative structure and command of the English language, Asimov seems quite strong (I'm a non-native English speaker, having read most of his works translated, but as an adult [and professional bilingual writer] I've read a few originals - e.g. Gold - and liked it truly.) Many readers actually love that style in the genre of hard-SF. No literary decorations, no convoluted characters... just the fundamentals: GREAT ideas envolving future technology and its iteraction with society, and a competent and serious development of these ideas. Salvo exceptions like the Lucky Starr space-cowboy series; and even those books were much above the level of "entertainment sci-fi" like Flash Gordon.
The movie surprised me with how faithful it was to the dozens of Asimov robot stories. Let me repeat: Asimov's themes fill the movie from start to finish. The movie's plot is entirely based on Asimov's four (yes, four) Laws of Robotics.
I agree. It also carries several other wins, like Asimov's view of future supercomputers (paternalist mainframes); a quite good early Dr. Susan Calvin (a perfect one should be a much uglier actress but hey that's Hollywood - other aspects are all right); the fundamental theme of human values against the possibilities of AI (the plot with a robot saving Smith's live over a girl). I even liked the few insights of comedy, e.g. the scene with the robot being a super-fast cooking helper (but it's actually a serious theme: how easily people will "sell" traditional values for the convenience of new tech).
I'll give that movie an 8, which is not perfect but still in the top 1% percentile for movie adaptations of science fiction.
Google's Android, and Sun's Java FX (FCS expected by Dec 02) and Java FX Mobile (expected sometime in late Q2'2009).
This migration won't work for systems that employ advanced JIT code generation, such as Java. Modern production JVMs, like Sun's and IBM's, will create native code on the fly - and they will produce code that's ultra tuned for the specific processor that is running. This means using the best instructions available (like SSEx), and also fine-tune various behaviors, e.g. GC can be tuned for the L1/L2 cache sizes, and locking can be tuned to factors like number of CPUs/cores/hardware threads - so for example, if it's running on a uniprocessor/single-core machine, the JVM will simply not emit memory barrier instructions for memory model consistency.
And it's not only Java, we have an increasing large number of JIT compilers that may employ similar tricks: Microsoft .NET (CLR); Flash 9+ (Tamarin) for ActionScript; Mozilla TraceMonkey and Google V8 for JavaScript; new LLVM-based runtimes for other languages... the list is only growing. Even for traditional static-compiled languages, some apps can have multiple shared libs compiler for different CPU levels, and choose the best lib at startup.
The only way I see around this problem is making ALL these runtimes and applications migration-aware. Each process should be notified before the migration, initiate some pre-migration task, and after the migration, being notified again to resume work and if necessary perform some post-migration step. Specifically for Java, the pre-migration would need to "park" all threads in OSR safepoints, then free all JIT-generated code; and in the after-migration, retune/config itself for the new CPU, then unpark the threads - that would resume execution in interpreted mode until the JIT compiler recreates all native code for the new CPU. Fortunately this is relatively simple to do in JVMs, because all necessary plumbing is preexisting (safepoints, on-stack replacement... required for advanced GC and dynamic optimizations). And once a new JVMs are enhanced with this feature, thousands of Java apps become magically migration-aware. Could be harder though for other runtimes.
Still, very hot technology, just not as easy as we can imagine to get right and compatible with all applications.
A small nit: Tracemonkey doesn't compile entire functions. It compiles instruction streams, which means that if your function has a branch that's never taken that part of the branch will never get compiled.
I know that (have read all papers about tracing JIT). But traces can also span multiple methods (it does the equivalent of inlining) - so in terms of size of compilation units, it's equivalent to traditional compilers.
There's a tradeoff here, though, which is that you want the compilation to be fast (as in, avoiding the Java "let's be slow while we start up and jit all this code" syndrome).
We'll digress here, but... the startup of Java, today, is not dominated by JIT; modern JVMs have very lazy and extremely fast "client-side" JITs (of course, not as strongly-optimizing as server-side ones). APIs like Swing deserve most of that blame, as well as the dynamic classloading model that forces too much initialization (for all static data) to happen at loading time... Finally, Java is difficult to compared to other systems because it's massively "meta-circular" (even before Java-in-Java VMs like JikesRVM and Maxine); the VM is native, but 99% of all Java APIs are written in Pure Java. So it's not fair comparing Java to JavaScript (or most dynamic languages), these only boot fast because their APIs are implemented in C.
The faster your compiler needs to be, the less room you have to fancy optimizations.
The compilation speed vs. optimizations tradeoff is indeed important, but it's a war compiler writers are winning every year. Sun's Java7 will have a hybrid JIT (client+server) to provide both fast JIT, and highly-optimizing JIT for the "really hot" spots. Tracing is the next revolution for systems that need VERY fast and lightweight JIT like JavaScript - because tracing basically takes some NP-hard compiler tasks, such as producing the SSA form, and converts them into polynomial tasks (simply because a Tree is a much nicer structure than a DAG). Both TraceMonkey and V8 benefit also from dynamic optimization (profile-driven and speculative), a la HotSpot. I don't see compilation time as a significant barrier for optimization in either VM - remember, they have only the app code to optimize; even high-end RIA apps are very "browser bound", most work is performed by the browser, that is a native and previously-booted component.
Constant folding, yes (actually happens on the interpreter level too).
Constant folding (and other opts) is more effective in the end of a full pipeline (or in several places)... at the interpreter level, you can just fold program-level constants, e.g. "var Pi=3.14; ...; return 2*Pi*radius". But in a (real compiler's) LIR optimization phase, you can fold loads of compiler-induced variables, e.g. the resolved address of an array's zero-index element.
Instruction scheduling, maybe not so much. I agree with your main point that V8 and Tracemonkey can get faster on these benchmarks. I'm just not convinced they'll necessarily end up that much faster than SFX.
Well, I think this potential (to wipe the floor with SFX) is clearly there... like I said, V8 and TM are still missing a boatload of important optimizations, including many low-hanging fruits. They don't need to approach anywhere near say to HotSpot Server or gcc -O3; it's not a problem if they won't, ever, implement the really aggressive optimizations... even with only mid-range opts, they will become several times faster than today in many benchmarks. But SFX can't run that race too; it's already close (IMHO) to the max potential of its architecture. They can just keep adding special-case opts, things like their regex JIT which is a nice idea because JavaScript apps can rely on a lot of regex's... but you can't special-case everything.
These benchmark results are a bit debatable - I've seen different suites electing different "winners" and, while SunSpider seems to be the best, it's a long way from a robust benchmark like SPEC* or DaCapo.
In any event, even if SFX is leading the pack right now, that's because it's the most mature competitor, and its advantage won't last too long. I predict (and I write this logged with my account, not AC, so I would be forever glorified when this becomes true in 12 months max) that both V8 and TraceMonkey will take the lead, leaving SFX in a safe third place permanently.
The reason is very simple. All these new JS VMs are JIT compilers, producing native code. But SFX is a context threaded JIT. Context threading is just a step beyond traditional direct-threaded interpreters: functions are 'compiled' into streams of CALLs into routines that implement each bytecode operation, but there is limited inlining (simple operations and branches), with a focus on reducing branch misprediction.
OTOH, both V8 and TraceMonkey are "real compilers" that emit real native code (not CALL streams) for entire functions (or even larger chunks of code, with inlining). This is necessary to enable traditional optimizations like register allocation, instruction scheduling, constant folding, loop unrolling etc. Some of these optimizations can be performed on a high-level intermediate code representation (HIR), but that's typically not worth the effort without real compilation. E.g., loop unrolling will just waste memory an i-cache efficiency if performed by a threaded interpreter/JIT... as the real benefit of unrolling is giving the compiler a much larger basic block to perform other opts like extra folding and bounds-check elimination, or real low-level tricks like exploring using SIMD registers and operations / Instruction-Level Parallelism / prefetching / branch predication etc.
The only reason why V8 and TraceMonkey don't completely 0wn the benchmarks today, is that these JITs are still in their infancy. They have implemented the foundations (like V8's hidden classes or TM's tracing), but they still miss to implement dozens of important optimizations (including very easy ones - they just didn't have the time yet). Check some comments about V8's limitations. TM's developers have also commented on many limitations, quote (Andreas Gal: "If it talks to the DOM during the benchmark, we currently donâ(TM)t compile across such calls (we plan to for Beta2 though)". This and several other improvements are planned for future builds of Firefox 3.1. Notice that items like special support for DOM interactions and event handlers should be critical to some benchmarks - and of course to real-world RIA apps. I'm sure the V8 hackers are also working around the clock to fill in their own gaps. When both VMs are reasonably mature, SFX will have a VERY hard time competing (unless of course, they abandon the context threading model and mutate into a real compiler). Other optimizations, like JITted regex, can be implemented in all VMs and will eventually be ubiquitous.
That would only be true if the CPU is able to retire a sustained average of one instruction per clock cycle. SFGate's article makes a raw comparison between chips with different number of cores, threads and other factors, considering only GHz...
From the article: "Sinclair developed a scheme of assigning multiple BASIC keyword commands for each key, so users would have to press only one key (such as P for "PRINT") instead of typing out the entire command. Using the keyboard to type something that wasn't a BASIC command, however, turned out to be an exercise in frustration. Only masochists had any fun attempting word processing on the Timex Sinclair 1000."
I'm tired of this bashing of the Sinclar-family keyboards! Speaking as somebody who used one for over five years, I tell you that the multi-function keyboard was very efficient, at least for typing BASIC programs of course. Remember that all cheap 8-bit computers had to cut fabrication costs in items like cases, keyboards, power supplies etc; NONE of these machines had a decently built keyboard. With this economic constraint in mind, Sinclair solved two important problem: maximising typing speed for typical usage, and reducing wear-out of cheap keyboard components.
As for common text input, no problem because the ROM input routine was modal. The cursor would be toggled between several modes - it was a "K" for the main BASIC keywords or symbols, "E" for extended ones, "G" for graphics, "L" for letters, "C" for capitals and "?" to flag syntax errors in BASIC lines (an advanced feature, most machines would accept any input and only issue syntax error messages when you tried to run the program!). So you could type in any mode without continuous usage of SHIFT or other mode-changing keys. Another nicety was the embedded color-code input, made in "E" mode IIRC. Once you memorized the several functions assigned to each key, and got used to the modal system, you could type VERY fast compared to any other micro that also had low-wquality keyboards but required typing I,N,P,U,T,SPACE for INPUT and so on. (The Sinclair editor didn't require spaces; its BASIC pretty-printer inserted spaces as necessary... and these spaces didn't consume memory, like it happened with other micros, so people would resort to cryptic space-less coding like "FORX=0TO10:PRINTX:NEXTX", while Sinclair users very very porud that their BASIC listings were always readable with canonical spacing.)
P.S.: The model I used was a Brazilian TK-95, a Spectrum 48 clone that had a better keyboard, see photo and article here. This keyboard was among the best in this class of computers, I used it for 5+ years without any key stopping to work... even though I didn't program much in BASIC, in the end of the first year I was already hacking only in Z80 so I had to type stuff letter by letter. The keyboard was good enough for this and word processing - similar to the C64, but sans stupid layout problems. (I concede that the original rubber keyboard was bad for fast non-BASIC typing, like word processing.)
"Eclipse does some things really well -- taking advantage of being a Java-based editor, it can use RTTI to help in the code-writing process."
;-)
Not true. Eclipse (as well as NetBeans and ALL other Java IDEs) never uses RTTI (i.e. instanceof, reflection and other dynamic / runtime Java features). I does everything it does through parsing the source code. You know, in a decent language like Java (grammar of sane size and complexity, no preprocessor etc.), editors can perform a fast, but precise parsing of your entire projects in real time. And they even do that without monster-sized files - like "precompiled headers" our "source browsing information", common in C/C++ IDEs that struggle to implement a modest approximation of all smart-editing, refactoring and other features of Java IDEs.
Java was open sourced under the GPLv2.
http://www.sun.com/software/opensource/java/faq.jsp#g1
Another thing that bothers me: on Win2K and XP, after every batch of monthly hotfix updates I would delete the hidden Windows/KBxxxx directories that the hotfixes create with their uninstallers and backups of replaced files. I do that because I never, ever, remove any such updates, from small hotfixes to huge service packs (downgrading is for losers)... so, why wasting disk space with their uninstallers? Now on Vista I cannot do that anymore, because its Windows/* file organization is orders of magnitude larger, more complex and more obscure than ever before. I tried to identify which files and directories contain hotfix uninstallation data, but it's a mess, I'd take a lot of time to identify those files (I know how to do that, e.,g. with ProcMon logs) and I'd still fear breaking the whole OS if I remove or update any files manually.
Well, Vista is so big in the disk that wasting a few dozen Mb with unwanted uninstallers does not seem like a big deal anymore. But it IS a big deal when an OS's base file structure becomes so stupidly complex, that some body like me (Windows user since 3.0, ex-sysadmin, and skilled software developer) is too scared to touch it with his bare hands...
Complementing your list (also without google's help...)
Brooks, Frederick P.: The Mythical man Month, No Silver Bullet paper. Processualist.
Milner, Robin: From the Hindler-Millner typesystem used in ML and Haskell, and perhaps other functional languages too. Type inference stuff that gives you all the advantages of static typing without most of the type declaration work.
"1984 Wirth, Niklaus: You can call him by value; or you can call him by name." LOL!!
Codd, Edgar F.: Database pioneer, recently RIP.
Hoare, C. Antony R. : And one of the greatest conference speakers I've ever seen, a few years ago in ECCOP. Da man used a few old-style transparent sheets, with hand-drawing of the stuff he was talking about (a formalism/algebra to track pointers in ways that should be useful to compilers etc), superposing and moving several sheets together to make animations and special effects. Terribly humiliating for people like me who have to sweat hours on a colorful PowerPoint before making a good presentation.
Here: http://labs.google.com/papers/gfs.html. Abstract: "We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. (...)" Pretty damn cool stuff, very advanced but perhaps a bit too tuned for Google's needs. See also the papers on their own clustering technology and distributed programming framework (MapReduce), which go hand in hand with GFS. These are some of the technical secrets behind Google's search engine and other apps, it's funny how little you hear about them. For massive tasks like Google search, I don't think anybody can compete with standard technology, i.e. big machines (vertical scalability), off-the-shelf clusters and SQL servers, and conventional programming techniques.
...now promised for some post-Vista date. I know both FS'es are not the same thing, but both are supposed to be revolutions in this field, and there are important points in common like transactional semantics and much easier administration of large volumes of data.
I tried the WinFS beta a couple months ago, and was horrified with the amount of RAM it sucks. I got close to 300Mb running only Windows and the WinFS daemons; without any application running, and without using any feature of WinFS. If the best innovation Microsoft can do is installing a full-blown instance of SQLServer disguised as a filesystem, and if ZFS does not do that (I didn't test it), then Sun is pretty comfortable in this particular competition.
Isaac Asimov was ahead of its time.
Slashdot keeps posting articles about "Java killers" even when they are forced to rehash year-old products, articles and so-called benchmarks, that everybody have already seen (and dismissed) one year ago. Other recent example is Curl (which is new and will probably not succeed, but this will not stop Slashdot from posting dozens of "Curl kicks Java" articles in the following couple years until Curl fades into oblivion).
When it comes to performance, I will not mention the obvious all over again (Java not being slow today etc) but rather point that you can produce an impressive gfx demo even with GW-BASIC, if you have a good interface to OpenGL (or other native, hi-perf graphics library) simply because it's the native code (and better: dedicated HW) that will be doing virtually all the hard work. These demos are worth nothing, unless you can see that the demo app is doing a significant amount of work before piping data to the native engine. But if you are a believer, just fetch JDK1.4.0-beta2 and the JCanyon demo from JavaOne which totally humiliates the tiny "spinning teapot" demos of all "Java killers" I can find.
It is true that iexplore.exe processes are one without any open browser windows. But lots of MSIE support DLLs are always loaded by explorer.exe; look at its DLLs with NTHandleEx or other process inspector and you have some surprises...
1/7 = 0.(142857)*
^^
7 being the first natural number which inverse produces a composite periodic sequence, thus making an infinite sequence of 42's possible, and there they are -- in the very first interesting rational number we can find!!
It's GOT to mean something!!
For the good doctor's fans out thee, he dedicated an entire book (non-fiction) to this very subject.
A Choice of Catastrophes: The Disasters That Threaten Our World
Will this version finally come with the NTFS driver prebuilt (read-only is OK if it's not yet robust), or will the large number of NT+Linux dual-booters be required to config & rebuild the kernel for this single reason, as in all distros? :-(
If and when law enforcement becomes efficient in the Internet, people will just start to use encryption MASSIVELY. Every single thing from IRC to Napster will run on top of very strong encryption, and what are the net-cops going to do then (considering that they've already lost the fight to prevent use/export of encryption software)? Have an entire farm of supercomputers to break every single surfer's packets? This is a very substantial difference from "real world" law enforcement, where people have no access to near-perfect means to do anything they want without leaving evidence.