How to Kill x86 and Thread-Level Parallelism
kid inputs: "There's an interesting article discussing how one might go about 'killing' x86. The article details a number of different technological solutions, from a clean 64-bit replacement (Alpha?), to a radically different VLIW approach (Itanium), and an evolutionary solution (Opteron). As is often the case in situations like these, market forces dictate which technologies become entrenched and whether or not they stay that way (VHS vs Beta, anyone?). Another article by the same author covers hardware multi-threading and exploiting thread level parallelism, like Intel's Hyperthreading or IBM's POWER4 with its dual-cores on a die. These types of implementations can really pay off if the software supports it. In the case of servers, most applications tend to be multi-user, and so are parallel in nature."
"Throughput computing"..where the performance is measured not individually but in aggregate.o ug hputcomputing/ for more details.
See their media kit available at
http://www.sun.com/aboutsun/media/presskits/thr
However, I believe the whole idea is nothing new. AFAIK, there are only two ways of increasing the performance of a processor (Operations Per Second) - either increase the IPC (Instructions per cycle) by increasing parallelism or decrease the cycle time by increasing the clock Rate (Ghz).
Each method has its limits and follows the law of diminishing returns - for e.g. increasing the clock rate implies increasing the number of stages in the pipeline...and after say 10000 stages, the penalties imposed due to flushing the pipeline might compensate for the increased GhZ. Similarly if you manage to place 100000 cores on a chip, scheduling amongst these cores and providing realtime access to the memory for all these cores will become the bottleneck. Hence, I take statements like "how to kill the x86" with a pinch of salt.
Finally, it will the fabcrication (physical) technology that decides which one of these dies. For e.g. if tomorrow someone is able to come up with a process that enables 100Ghz chips at the (think extensions of SOI etc) decreasing the cycle time will win. Similarly, if someone comes out with femto (10^-15 ) metre fabrication technology, then parallelism will win.
The neat hardware implementation of this would be to make all MOV instructions take nearly the same time, regardless of the amount of data moved. A MOV should result in a remapping of the source and destination memory in the cache system. Even if this were just implemented for aligned moves, it would be a big help. When your application's 8K buffer needs to be copied to the file system, that copy should be done by updating cache control info, not by really doing it.
With this, windowing becomes far simpler. Each window is maintained locally. Shared window management is reduced to screen space allocation, which is done by commanding the window MMU.
The problem with keeping the x86 architecture and the ISA are that it is carrying around legacy burdens from the 286. Even the p4 still boots into real address mode at boot up, and has to be PUT into protected mode. There are hundreds and hundreds of instructions, over 100 registers (but still only 8 GPRs), many of which overlap in purpose or are used for entirely non-intuitive purposes (CMPX EAX, EAX). x86 is ready, at the least, for a real version 2, that isn't afraid to break compatibility in order to add major architectural advances (I wouldn't mind a register ring :).
====
Crudely Drawn Games
The power 5 will have 2 cpus on a die, and they both will behave like hyperthreading intel cpus.
so each 'cpu' will look like 4 logical cpus
PHP is the solution of choice for relaying mysql errors to web users.
Programs do the exact same thing on every run. Jumps are jumps, they can be followed.
If only it were that simple. Ever heard of a "computed goto"?
IIRC, Apple did this with their Dos Compatible Macs.... You could run both DOS and MacOS at the same time, and had some bios function that switched between the two.