RISC Vs. CISC In Mobile Computing
eldavojohn writes "For the processor geeks here, Jon Stokes has a thoughtful article up at Ars Technica analyzing RISC vs. CISC in mobile phones (Wikipedia on Reduced Instruction Set Computers and Complex Instruction Set Computers). He wraps it up with two questions: 'How much is the legacy x86 code base really worth for mobile and ultramobile devices? The consensus seems to be "not much," and I vacillate on this question quite a bit. This question merits an entire article of its own, though,' and 'Will Intel retain its process leadership vs. foundries like TSMC, which are rapidly catching up to it in their timetables for process transitions? ARM, MIPS, and other players in the mobile device space that I haven't mentioned, like NVIDIA, AMD/ATI, VIA, and PowerVR, all depend on these foundries to get their chips to market, so being one process node behind hurts them. But if these RISC and mobile graphics products can compete with Intel's offerings on feature size, then that will neutralize Intel's considerable process advantage.'"
There are no CISC CPUs anymore. There are RISC CPUs with RISC instruction sets (e.g. ARM) and there are RISC CPUs with CISC instruction sets (e.g. x86). The cores are mostly the same, except that the chips with CISC instructions need to do a little more work in the decoder. It requires a bit extra transistors and a bit more power, but it's not a huge deal for PCs and servers. Of course, for embedded applications, it makes a difference and for those it makes sense to have more "specialised" architectures (from microcontrollers to DSPs, ARM and all kinds of hybrids).
Opus: the Swiss army knife of audio codec
RISC vs. CISC? What is this, the early 90's? There are no RISC chips anymore, except as product lines that were originally developed with the RISC methodology in mind. Similarly, true CISC doesn't exist either. Microcode has done wonders in turning complex instructions into a series of simpler instructions like one would find on a RISC processor.
The author's real point appears to be: x86 vs. Other Embedded Architectures. Without even looking at the article (which I did do), it's not hard to answer that one: There is no need for x86 code in a mobile platform. The hardware is going to be different than a PC, the interface is going to be different than a PC, and the usage of the device is going to be different than a PC. Providing x86 compatibility thus offers few, if any, real advantages over an ARM or other mobile chip.
If Intel's ATOM takes off, it will be on the merits of the processor and not on its x86 compatibility. Besides, x86 was a terrible architecture from the get-go. There's something mildly hilarious about the fact that it became the dominant instruction set in Desktop PCs across the world.
Javascript + Nintendo DSi = DSiCade
The iPhone makes any other mobile device pointless. Why even bring this topic up?
Is not the only question. What about how much are the Intel and AMD brand names worth?
Choosing the lesser of two evils is a choice for evil.
RISC vs CISC was the architecture flamewar of the late 1980s. Welcome to the 21th century, you'll like it here. It's a world when, since the late 90s, the ISA (instruction set architecture), is so abstracted away from the actual micro-architecture of microprocessor, as to make it completely pointless to make distinctions between the two. Modern processors are RISC, they are CISC, they are vector machines, they're everything you want them to be. Move on, the modern problems are now in multi-core architecture and their issues of memory coherence, cache sharing, memory bandwidth, interlocking mechanisms, uniform vs non-uniform, etc. The "pure RISC" standard bearers of yore have disappeared or have been expelled from the personnal computing sphere (remember Apple ditching PowerPC ? Alpha anyone ? Where are those shiny MIPS-based SGIs gone?). Even Intel couldn't impose a new ISA on its own (poor adoption of IA-64). The only RISC ISA that has any existence in the personnal computing arena, including mobile, is ARM, but precisely, they do only mobile. There's really no reason at all to build any device on which you plan to run generic OSes and rich computing experience on anything else than x86 or x86-64 machines.
Intel sucessfully killed the high end CPU manufacturers. However, recently they have had poor performance in the very low power arena. Their main offering (XScale, until they sold it) was poor compared ot the competitors. Compare the Intel PXA27x to the Philips LPC3180. The philips chip has about the same instruction rate for integer instructions (at half the clock rate), hardware floating point (so it's about 5x as fast at this) and draws about 1/5 of the power. I know which one I prefer...
Unlike the old RISC workstation manufacturers which relied on a small market of high margin machines, the current embedded CPU manufacturers operate in a huge, cut-throat world where they need to squeeze up the price/performance ratio as high as possible to maintain a lead. I think this market will be somewhat tougher to crack than the workstation market, since intel does not have what they had before: an advantage in volume shipped.
SJW n. One who posts facts.
The only "benefit" that has come of having x86 processors in MIDs so far has been seeing the developers cram Vista on an already slow device, making it crawl even worse. Or they stick XP on it, packing an OS completely not designed for MID use on it.
Using ARM on mobile platforms at least offers some hope of making a clean break from all the backwards compatbility cruft that x86 has dragged along with it for decades now.
I've had to work with the ARM ISA in the past (I was studying its implementation as a soft core on an FPGA), and I can tell you it doesn't follow the RISC philosophy well, if at all.
One very non-RISC thing ARM did was move the shift instructions into every arithmetic instruction. That's right: there are no dedicated shift instructions. When you need a shift instruction, you have to encode it as part of a move operation or an add. In effect, every add, and, or, sub, etc. is actually a an add+shift, and+shift, or+shift, etc. This is the opposite of the RISC philosophy, and it significantly complicates the hardware, since a variable shifter has to be on the ALU critical path.
Other non-RISC things ARM did include the Java instruction set extensions, the Thumb instruction set extensions (further reduce code size), vector & media instruction set instructions, etc.
I think calling ARM "RISC" is a marketing decision only, done for historical reasons. It doesn't have much to do with the technical reality, IMO. Jon Stokes would have done better to say ARM vs. x86, instead of RISC vs. CISC, which is an outdated idea back from the 80s & 90s.
They just aren't very important distinctions anymore.
Both refer to the instruction sets, not the internal workings. x86 was CISC in 1978 and it's still CISC in 2008. ARM was RISC in 1988 and still RISC in 2008. AMD64 is a border line case.
People get confused with the way current x86's break apart instructions into microops. That's doesn't make it RISC. That just make it microcoded. That's how most CISC processors work. RISC process rarely use anything like microcode and when they do, it is looked upon as very unRISCy.
Today, the internals of RISC and CISC processors are so complex that the almighty instruction set processing is barely a shim. There are still some advantages to RISC but they are dwarfed by out-of-order execution, vector extensions, branch prediction and other enormously complex features of modern processors.
RISC architecture, interestingly, makes things hella fast. The decoder stage follows an easy path; conditionals occur as prefixes, just like cmov in i686 land. This means when the chip makes an instruction fetch, it does pretty much no extraneous work; it just wanders through a couple paths (a decision to execute or skip, and then into execution or back to fetch) in one go. Modern CPUs do a hell of a lot of work just to decide how to handle an instruction.
Support my political activism on Patreon.
The only real point in x86 is Windows compatability. Linux runs fine on ARM and many other architectures. There are probably more ARM Linux systems than x86-based Linux systems (all those Linux cellphones run ARM).
Apart from some very low level stuff, modern code tends to be very CPU agnostic.
Engineering is the art of compromise.
There's no distinction between the two any more, and hasn't been for a long time. The whole point of RISC was to simplify the instruction format and pipeline.
The problem these days is that it doesn't actually cost anything to have a complex instruction format. It's such a tiny, isolated piece of the chip that it doesn't count for anything, it doesn't even slow the chip down because the chip is decoding from a wide cache line (or multiple wide cache lines) anyway.
So what does that leave us with? A load-store instruction architecture verses a read-modify-write instruction architecture? Completely irrelevant now that all modern processors have write buffer pipelines. And, it turns out, you need to have a RMW style instruction anyway, even if you are RISC, if you want to have any hope of operating in a SMP environment. And regardless of the distinction cpu architectures already have to optimize across multiple instructions, so again the concept devolves into trivialities.
Power savings are certainly a function of the design principles used in creating the architecture, but it has nothing whatsoever to do with the high level concept of 'RISC' vs 'CISC'. Not any more.
So what does that leave us with? Nothing.
-Matt
As I read the previous posts, it seems like the focus is on RISC vs CISC but I think the real question is there value-add for designers to have an x86 compatible embedded microcontroller?
People (and not just us) should be asking would end customers find it useful to be able to run their PC apps on their mobile devices? Current mobile devices typically have PowerPoint and Word readers (with maybe some editing capabilities) but would users find it worthwhile being able to load apps onto their mobile devices from the same CDs/DVDs that were used to load the apps onto their PCs?
If end customers do find this attractive, would they be willing to pay the extra money for the chips (the Atom looks to require considerably more gates than a comparable ARM) as well as for the extra memory (Flash, RAM & Disk) that would be required to support PC OSes and apps? Even if end customers found this approach attractive, I think OEMs are going to have a long, hard think about whether or not they want to port their radio code to the x86 with Windows/Linux when they already have infrastructures built up with the processors and tools they are currently using.
The whole thing doesn't really make sense to me because if Intel wanted to be in the MCU business, then why did they spin it off as Marvell (which also included the mobile WiFi technology as well)?
The whole think seems like a significant risk that customers will want products built from this chip and the need for Intel and OEMs to recreate the infrastructure they have for existing chips (ie ARM) for the x86 Atom.
myke
Mimetics Inc. Twitter
I never delved too far into the RISC vs CISC debate, but my understanding is that RISC uses a small number of simple, generic instructions that execute very quickly, and the compiler builds functionality upon those tiny building blocks. CISC uses a larger number of specialized instructions, each one doing a larger amount of "work" as one black box, where the RISC chip would break that down into several smaller tasks. Since RISC executes faster, overall performance is still good.
So my point is: if RISC needs more instructions to do the same work, does it require a higher clock frequency to achieve similar performance to a CISC chip ? Since clock speeds do not scale to infinity, this implies that a RISC chip will hit the frequency wall sooner, thus limiting its maximum speed.
Much of the work done by a modern Intel CPU involves clever decoding, caching and scheduling, to extract as much parallelism as possible from the x86 instruction set. If you were to somehow disable all the prefetching, hyper-threading, predictive branching and all the other bullshit that isn't directly tied to x86 decoding and execution, that Core 2 chip would be no better than a superclocked 386. That "bullshit" works hard to alleviate or outright negate many of CISC's weaknesses.
The simplicity of a RISC design leads to excellent production cost advantages and remarkable power efficiency, because there's a lot less "bullshit" on the die. Low cost + moderate performance + high efficiency = embedded nirvana. That's why we see them in cell phones, RAID controllers, microwave ovens, TVs, etc.
Meanwhile, in PC land, things are more expensive, and performance is king. Nobody wants a slow laptop, because we have work to do, otherwise we wouldn't buy the stupid laptop in the first place. We also want the laptop to sync with our desktop, run the same apps and hook up to the office network. It's bad enough that we have to work through the flight (or bus ride), we don't want to run (and have to learn) heterogeneous platforms.
RISC will continue to reign in small, cheap, battery-powered gadgets. That's what it does best, by design and in practice. That's its turf, where big bad CISC will not dare tread, not even their redheaded stepchild Atom.
-Billco, Fnarg.com
As many other people have noted, the classical RISC vs. CISC debate is moot, since all modern processors have elements of both. The real fight in the mobile space now is between general-purpose low-power processors, which may consume more power when doing certain computationally expensive tasks, and processors with specialized acceleration units that optimize certain tasks, at the expense of performance and power efficiency for general-purpose work.
I suspect that in the near future, the more interesting fight will be between chips with media acceleration features, and separate offload chips that allow the device to handle media decode with a very low-power general-purpose chip that otherwise wouldn't be up to the task.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
How is the parent post offtopic?
The article mentions that Intel killed the old workstation RISC vendors. Parent posts suggests why this is not so easy for Intel.
And further, how is intel having a poor track record for embedded processors compared to other manufactureres offtopic for an article about intel producing embedded processors?
SJW n. One who posts facts.
The mobile world does lots of wireless communication ( My cellphone alone does: bluetooth v1 / 2, IR, GSM, Wifi ). The best chips suited for these wireless communication tasks are DSP / CISC chips. Given that there is lots of wireless communication tasks I forsee DSPs / CISC chips having a significant percentage of the tasks.
BUT the DSPs / CISC chips won't ever REPLACE the General Purpose Processors / RISC chips. Most cellphones today are actually a combo platform with a General-Purpose-Processor that does the user interface ( other related tasks ) and a DSP ( to do the wireless communication and heavy lifting tasks ).
The biggest advantage might be to make it easier to use old applications and of those old apps, the most desireable would seem to be the mountains of VGA or lower resolution DOS games (because the screens on mobile devices are smaller so high res is not as usable.) I assume that if they ran some version of an x86 chip, then creating a VMWare like emulaiton would be simpler and the manufacturer could relicense/resell a ton of older games to a generation that had never played them.
On the other hand, why haven't wearable, glasses-style displays taken off for these things? I would love to have that for playing games or movies on plane trips.
""
The problem these days is that it doesn't actually cost anything to have a complex instruction format. It's such a tiny, isolated piece of the chip that it doesn't count for anything, it doesn't even slow the chip down because the chip is decoding from a wide cache line (or multiple wide cache lines) anyway.
""
The problem with your assumption is that it's _wrong_.
It does cost something. The WHOLE ARTICLE explains in very good detail the type of overhead that goes into supporting x86 processors.
The whole point of ATOM is Intel's attempt to make the ancient CISC _instruction_set_ work on a embedded style processor with the performance to handle multimedia and limited gaming.
The overhead of CISC is the complex arrangement that takes the x86 ISA and translates it to the RISC-like chip that Intel uses to do the actual processing.
When your dealing with a huge chip like a Xeon or Core2Duo with a huge battery or connected directly to the wall then it doesn't matter. Your taking a chip that would use 80watts TPD and going to 96.
But with ARM platform you not only have to make it so small that it can fit in your pocket, but you have to make the battery last at least 8-10 _hours_.
This is a hell of a lot easier when you can deal with a instruction set that is designed specifically to be stuck in a tiny space.
If you don't understand this, you know NOTHING about hardware or processors.
The market for workstation was increasingly flooded with users who were looking for the only thing they recognize : a machine running Windows with associated software.
Given that Microsoft's presence and quality in non-Intel architecture was a joke at best, all of these newcomers catered to Windows-capable architecture. Then with market economics doing their work, price of x86 hardware fell enough to interest also the rest of the workstation people : a off-the-shelf x86 processor running linux became a cheaper alternative to some proprietary UNIX running on some obscure RISC architecture.
Nowadays the situation is radically different. What users of those ultra-light machine are expecting isn't "taking microsoft vista and turbotax" in their pocket. But rather having pocketable internet. Which doesn't depend that much on specific software vendor. That is very well illustrated with the success that Linux-based subnotebooks such as the EEE PC are enjoying. And Linux has the big advantage of being very easy to compile cross-platform. Particulary the kernel which has already seen massive use in embed device such as routers, modems, etc. powered by MIPSs and ARMs.
Thus small device can very well accomodate ARMs. Intel has no real advantage, except maybe for a couple of key technologies from vendor too lazy or too talentless to port their code to other architectures (like Flash). Intel's only hope is in Flash being such a killer feature that will require x86 ISA.
Microsoft being desperate to push Windows XP for subnotebook even if it is going to deprecate soon is a clear indicator of this tendency.
Intel themselves got caught at their own ISA lock-in when they tried to launch Itanium. It mostly tanked because it lacked ports of key software (and because non-ported code was very slow).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The real question is whether ARM thumb instructions have higher code density than x86 instructions for infrequently executed code. Instruction bandwidth is far more precious than the execution complexity on-die (chip IOs toggling far outweigh any decoder logic), so for mobile it's really about how efficiently you can compress the instructions, not what kind of architecture they are based on. I'm guessing ARM still comes out ahead, but it would be an interesting experiment to run...
x86, which is the classic CISC, is also a variable-length ISA. That means certain instructions take just a single byte to encode, compared with a fixed 4 bytes on the most common RISCs. This can be a factor in instruction cache size/effectiveness. Fewer bytes for instructions == more instructions fit in the ICache == ICache is more effective. I don't have any numbers, but I would expect the average instruction length on CISC to be in the 10s of % smaller than RISC. That means either greater performance, or lower power, or both. Perhaps enough of each to offset the greater power/die area required to decode these variable-length instructions.
It's not a big factor, but combined with the other points that have been made in this thread (about how CISCs translate to RISC internally, etc., the advantages of using a very-mature x86 OS/software stack), there are a number of reasons why an x86 embedded processor might be good.
Every 4+ comment has the same "RISC|CISC is dead" comment talking about how x86 chips break down that massive, warty ISA into a series of RISC-like micro-ops for internal consumption. And that this has been the case since at least the Pentium Pro.
Read the article. Jon Stokes makes that point: but he also makes the point that in embedded processors, it does matter, because the transistor budget is much, much smaller than for a modern desktop CPU. It may come to pass in a few generations of die feature shrinking that we arrive back at the current situation of ISAs becoming irrelevant, but for the moment in the embedded space it does matter that you need to give up a few million transistors to buffering, chopping up and reissuing instructions compared to just reading and running them.
Remember, this is Jon Stokes we're talking about: he's the guy that taught most Slashdotters what they know about CISC and RISC as it is.
Classical Liberalism: All your base are belong to you.
Intel tried that experiment with XScale and eventually sold it off to Marvell.
I understand the theory--you simplify instructions, do things to speed up the processor so it can run faster, then optimize the processor to run as fast as you can.
In other words, you are designing your instruction set to your hardware.
Now, assuming that you are going to have close to infinite investment into speeding up the CPU, it seems that if you are going to fix an instruction set across that development time, you want the instruction set that is the smallest and most powerful you could get it.
That way for the same cycle instead of executing one simple instruction you are executing one powerful one (that does, say 5x more than the simple one)
Now at first the more powerful one will take more time than the simple one, but as the silicon becomes more powerful, The hardware designers are going to come up with a way to make it only take 2x as long as the simple one. Then less.
I guess I mean that you will get more relative benefit tweaking the performance of a hard instruction than an easy one.
Also, at some point the Memory to CPU channel will be the limit.
I'd kinda like to see Intel take on an instruction set designed for the compiler rather than the CPU (like Java Bytecode). Bytecode tends to be MUCH smaller--and a quad-core system that directly executes bytecode, once fully optimized, should blow away anything we have now in terms of overall speed.
The distinction you seem to be trying to draw here is not very sound. Modern CPUs "translating instructions into hardware instructions" with a gate maze is essentially the same thing as pulling a wide microcode word from ROM whose bits directly control the logic units. In both cases you put some bits in to start the process off, and you get a larger number of bits as a wide bus of signals out, which are used to direct traffic inside the CPU. The picture only looks
Specifically, the different parts of each microcode instruction executed in parallel then, just as now, though out of order execution was much rarer (some DSPs had it IIRC). This was not because microcode as it was then conceived couldn't handle it, but that the in-CPU hardware to support it wasn't there. There's no point going through gymnastics to feed your ALU if you've only got one and it's an order of magnitude slower than the circuit that feeds it.
One of the biggest annoyances of staying in any one field for too long is having to watch some technology following the logical path from conception to fruition go through an endless series of renaming (AKA jargon upgrades) that add nothing but confusion and pomposity to the field.
--MarkusQ
Ars Technica has a Flash advertisement that consumes 99% CPU time.
If your device is essentially an FPGA, then implementing the app-specific stuff in programmable logic and throwing in in an ARM IP core keeps the component count down.
I don't think you can get "Atom" as a VHDL IP core.
You only need a half dozen or so instructions to program the whole thing in brainfuck anyway.
Nullius in verba
It should have been called something like "Atom architecture overview, it's future, and how it compares to ARM".
And to all those that rant how RISC is dead: Did you actually RTFA?
VPS-like shared hosting, on under-crowded servers.
mod parent up!
While this was by far the most common sort of implementation, it wasn't what drove the definition. Many factors can effect how things ultimately get laid out on the silicon, and nobody ever said "well, we thought it was going to be micro coded but the ROM area wound up L shaped instead of rectangular, so I guess it isn't."
What drove the definition was what differentiated micro-coded architectures from their piers and predecessors--the explicit use of a systematic way to organize and sequence the control lines (and there was some overlap and blur around the edges--ad hoc systems with "meta-control lines," gates arrays, RAM, and even demultiplexors instead of ROMS, etc.) to permit the design of more complex instructions. Because they were systematic, such systems could be written down like code instead of being laid out like circuits (which the ultimately were) and thus the name.
Microcode is a way of designing (and thinking about) a CPU, not at the end of the day a way of implementing one. You could take a fully specified microcoded architecture and opportunistically replace some or all of the microstore with a gatemaze without effecting it's formal behaviour. Since the result would often be smaller, faster, and use less power this was commonly done.
--MarkusQ
uh huh huh he said vacillate
Having to work for a living is the root of all evil.
Religion and design aesthetics aside, the commercial advantage of RISC was entirely a consequence of the fab process capabilities of the time RISC was introduced. At any given process, you can fit only so many gates on a die before yield goes to hell completely and only the government can afford to buy the chips. RISC uses fewer gates than CISC because it's a much simpler decode. So at some process you can fit a reasonable RISC processor on a cheap enough chip where you couldn't fit the same capability in CISC on the same chip. Alternatively and equivalently, you could fit a dinky CISC or a more capable RISC on the chip. That edge process happened about 1980, and lasted for two fab generations ending around 1985. Then the fabs got good enough that you could fit either CISC or RISC, your choice, and the commercial advantage of RISC ISAs went away. The formerly dinky CISC got jazzed up by true designer heroics, and today you choose an architecture based on other reasons. There remain a few markets where cheap (power and area) decode still matters enough to influence dollar decisions. Mobile is one, and CISC as she is spoke in x86 is at a disadvantage. However, most posters seem to assume that CISC necessarily has expensive decode, and that CISC == x86. Both assumptions are false.
Sounds good -- do bytecode execution, as bytecode is optimized for compilation...
But - turns out bytecode directly executed is in the same ballpark as "regular" instructions. Doesn't really gain much. (sorry, can't cite)
The reason(s)?
- programming languages following instruction conventions (example: C). C is simple, and follows a "PDP-11" model
- programming languages not expressive enough, unless they are profiled (examples: C, Java). No way to mark code "rarely used", No way to indicate parallelism.
The old CDC 6000 Pascal implementation gave us a type "alpha" which was a packed array of 10 characters. Of course that machine had a 60 bit word size, and used a 6 bit character representation. The alpha type was a string that fit into a machine register. Comparing these strings was as efficient as integer comparison.
How do you do something like this in C? The program COULD note that a particular string will not exceed a limit, but it could never hard-code that into machine code, unless an exception mechanism that was less expensive than a comparison could be put into place. Altogether a nasty problem (either for the CPU, or for the compiler writer).
Just another "Cubible(sic) Joe" 2 17 3061
> But - turns out bytecode directly executed is in the same ballpark as "regular" instructions. Doesn't really gain much. (sorry, can't cite)
Not sure what you mean. If you are saying that a CPU running bytecode tends to run your code as fast as a CPU running assembly, that's what I'm assuming. But the CPUs running bytecode haven't been optimized as much as the intel CPUs..
Also, we could put more powerful instructions into the bytecode if they tend to execute too quickly. Stuff like complete control over memory management, garbage collection, etc. Each feature that would be "Kernel" could be moved into the CPU allowing even more for the hardware engineers to optimize.
You would start to lose C-style pointer functionality at some point, but you could even gain speed doing so because now the CPU can actually optimize things a compiler used to have to optimize.
We already have cases where Java can outperform C and even unoptimized assembly because they don't know enough about the code they are running.
for instance, if Java is calling a routine and that routine's data hasn't changed, java can flag that routine and stop calling it until it's data changes. Of course you could do that in assembly, but java will do it automatically for every routine in your program who's data it can isolate.
I believe that Scala could even do more with this kind of optimization.
Same with garbage collection. In C you allocate, use, then de-allocate. In Java, the VM never wastes time on the de-allocation step for the vast majority of its allocations. (See "eden" in any recent garbage collection white-paper).
Microsoft has always coded part of Excel in bytecode. Not for speed but for code-size. If it makes that much of a difference, wouldn't code-size eventually become a transfer issue?
I just think RISC is going the wrong way when you take processor improvements into consideration.
But, look at the WAY that Java gets faster than C: by interpreting and being introspective (in your example, detecting idempostent like functions). Doesn't matter if the processor is executing Java bytecode or not. Indeed, HP has been inserting a PA-RISC interpreter into binaries to pick up similar gains.
As to the idea of higher order functions -- it's been tried. As an example, the Intel architecture has "task switch" segments. Not used, because it turns out that the incomplete case is faster. Or, examine the performance of the generic VAX call instruction. Or the idea of BCD arithmetic modes. Ideas that seemed good at the time... At a higher level - how would garbage collection be assisted? GC is now down to:
Get memory: if sufficient space, advance pointer and return
Which, in machine terms is: compare, branch_greater, add, return. Comes to four instructions. The GC operation itself is dependent on a bunch more stuff, that you (probably) don't want to bring in.
Scheduling? Where would the policy be implemented? I could see the return of "io channel programs" in order to assist the use of co-processors. But that doesn't change the instruction set architecture much.
I am not really weighing in on "RISC vs CISC" here. Generally, I prefer RISC, simply because manipulating code from a program tends to be easier. TFA indicates that the silicon to decode CISC is significant in low-power systems. I was indicating the building a Java bytecode implementation doesn't gain much.
Also, we could put more powerful instructions into the bytecode if they tend to execute too quickly. Stuff like complete control over memory management, garbage collection, etc. Each feature that would be "Kernel" could be moved into the CPU allowing even more for the hardware engineers to optimize.
You would start to lose C-style pointer functionality at some point, but you could even gain speed doing so because now the CPU can actually optimize things a compiler used to have to optimize.
We already have cases where Java can outperform C and even unoptimized assembly because they don't know enough about the code they are running.
for instance, if Java is calling a routine and that routine's data hasn't changed, java can flag that routine and stop calling it until it's data changes. Of course you could do that in assembly, but java will do it automatically for every routine in your program who's data it can isolate.
I believe that Scala could even do more with this kind of optimization.
Same with garbage collection. In C you allocate, use, then de-allocate. In Java, the VM never wastes time on the de-allocation step for the vast majority of its allocations. (See "eden" in any recent garbage collection white-paper).
Microsoft has always coded part of Excel in bytecode. Not for speed but for code-size. If it makes that much of a difference, wouldn't code-size eventually become a transfer issue?
I just think RISC is going the wrong way when you take processor improvements into consideration.
Just another "Cubible(sic) Joe" 2 17 3061
I hate the way RISC is embraced by many as a pretext to operate their brain in neutral.
Consider the x86 read-modify-write address mode, which few RISC chips incorporate.
Compared to the RISC load, operate, store sequence, this saves you: two instruction fetches, the unnecessary use of a named register, unnecessary read/write transfers to the permanent register file, and transferring the *same* load/store address to the memory order buffer *twice*.
Fixed width RISC instruction sets with large register files and full orthogonality typically have terrible code density. This directly reduces the efficiency of the icache, which in modern chips represents far more transistors than RMW address mode support introduces into the core.
ARM is a good study in this. What did ARM end up doing to address their code density / speed problem? They invented Thumb-2, which sacrifices half the register set to achieve a better balance in time and space.
http://www.arm.com/products/CPUs/archi-thumb2.html
Some of the original RISC ideas might have been good at the time, but times change.
Excessive orthogonality is simply wasteful. Modern compilers don't have great problems with register coloring to achieve efficient code generation. Yes, someone once had to think very hard to code this. How cruel.
Orthogonality is most beneficial to programmers writing assembly by hand, because people aren't nearly so good as compilers at remapping all the registers in a loop every time one instruction in the loop changes.
An excessively large register file can become a liability at call boundaries and context switches. The x86 addressing modes permit more aggressive use of the L1 dcache as an adjunct to the paltry register file. Otherwise, it would have fallen out of the game long ago.
In some areas, x86 is pure liability: the instruction prefix bytes, too many partial flag register updates, a terrible floating point architecture, and a museum of dead and useless 286 instructions which seemed like a bright idea at the time (I think the PHB interned at Intel that year).
In terms of power efficiency, the x86 instruction encoding and paltry register file can't be redeemed. The magic to make these problems disappear costs a lot in power.
But it's stupid to say that x86 has no advantages over traditional RISC. It has plenty, if you stop to think about it.