HPs Dynamo Optimizes Code

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Thursday March 23, 2000 @04:10AM from the nifty-tricks dept.

sysboy writes "ArsTechnica have a very interesting piece about HP's Dynamo. " Interesting stuff about their run-time optimization stuff. They compare it to Transmeta's code morphing technology.

2 of 78 comments (clear)

Min score:

Reason:

Sort:

Re:Bad compilers? by David+Greene · 2000-03-22 23:54 · Score: 4

Yes, you are. HP has some of the best compiler people around.
The thing about static compilers is that they have no idea what happens at run-time. Profiling has been used to mitigate this somewhat, but it's still a huge problem.
Accesses through memory are slow, so you want to get rid of them. One way to do this is through register allocation. Unfortunately, even if an infinite number of registers was available, you couldn't allocate everything to registers.
Why? Because we use pointers. There are multiple names for the same data running around in our little electronic brain. When you allocate something to a register, you bind it to one name. This is by definition incorrect for aliased data (data with multiple names).
Optimizations like register promotion try to get around this by allocating things in regions where the compiler can prove it only has one name. But this is exceedingly difficult when you have things like function calls which must be assumed to access and modify lots of data.
I won't even get into the problem of static instruction scheduling or other optimizations like partial redundancy elimination.
In short, aliasing through memory is nearly impossible to track accurately at static compile time. At run-time, the machine knows exactly which memory accesses reference which data, so things like run-time register allocation can do a better job. Crusoe does this to a limited extent.
Dynamo is essentially a software trace cache. Except that when forming the trace, it does transformations like Common Subexpression Elimination and other traditional compiler manipulations.
IBM has the Daisy project, which does code morphing from PPC to a VLIW ISA. I believe it also does some run-time optimizations. Projects like DyC and Tempo have been compiling at run-time for a while now.
I like to think of dynamic compilation in terms of the stock market. Which would you rather do: trade stocks with only limited information about their past behavior (and sometimes none at all), or trade stocks after having observed the absolutely most recent trends? I'll bet that if you pick the first strategy and I pick the second, I'll beat you every time.
That said, there are tricks ou can pull with static compilation. IA64 has the ALAT, which lets the machine track when store addresses match load addresses. This lets the static compiler speculatively move a load ahead of the store. If the store conflicts, the machine will execute some compiler-provided code to fix up the error. Essentially, the compiler is making an assumption that the load and store do not reference the same data and is communicating that assumption to the machine. The machine checks the assumption and invokes some fixup code if it proves to be incorrect.

--

--
Re:Bad compilers? by Cederic · 2000-03-22 23:39 · Score: 4

The review explains that the optimisations performed are those not available to a static compiler. Because the compilers are effectively optimising 'design time' code, there are further optimisations that can be performed on the 'run time' binary.

As an example, consider a fragment of code that performs an operation on a sequence of numbers. The sequence is unknown at design time - bounds checking, conditional statements and other code will all need to be compiled in to ensure different sequences can be handled. At runtime it may be that the same 4 number sequence is processed a large number of times. The runtime optimiser (Dynamo) detects this and can optimise out the bounds checking, the conditional statements and streamline the rest of the code.

It's very clever technology, but it's surprising it hasn't been done more extensively before now. What would be very interesting is to see if something like VMWare incorporated this technology - could it effectively negate the CPU cost of running VMWare itself?

~Cederic