Variable Instruction Computing: What Is Old Is New Again (hackaday.com)

← Back to Stories (view on slashdot.org)

Variable Instruction Computing: What Is Old Is New Again (hackaday.com)

Posted by timothy on Friday February 19, 2016 @11:58AM from the ever-thus-shall-it-be dept.

szczys writes: Higher performance, lower power. One of the challenges with hitting both of those benchmarks is the need to adhere to established instruction sets like x86. One interesting development is the use of Variable Instruction Sets at the silicon level. The basic concept of translating established instructions to something more efficient for the specific architecture isn't new; this is what yielded the first low-power x86 processors at the beginning of the century. But those relied on the translation at the software level. A company called Soft Machine is paving the way for variable instructions in hardware. Think of it as an emulator for ARM, x86, and other architectures that is running on silicon for fast execution while sipping very little power.

52 comments

Min score:

Reason:

Sort:

Flexible, but better than fixed? by JoeMerchant · 2016-02-19 12:11 · Score: 2

Can they really get better performance per watt on general computing using a flexible substrate? Seems like whatever design they set up the flexible (FPGAish?) circuit to do, could be faster and lower power if it were put into a fixed silicon (or similar) implementation. Maybe if your workload devolves down to very simple needs for long periods of time, this might take advantage of that.
1. Re:Flexible, but better than fixed? by Aighearach · 2016-02-19 16:17 · Score: 4, Informative
  
  The problem with your analysis is that the words "flexible" and "fixed" don't have technical meaning here. Those aren't differing physical characteristics to choose between, those are just human-level descriptions of how the programming will be organized.
  An FPGA intentionally has a whole bunch of extra circuits supporting each logical unit, those are expected to take a lot of extra power because it is additional functionality. An FPGA doesn't use more power per physical transistor, it just has a whole bunch of transistors and other logic devices for each programmable unit. When you then implement the circuit as an ASIC, it uses less power because it uses less logic devices, not because there is some other qualitative difference.
  Something like this, any extra logic devices would be specifically designed to manage the other logic devices for low power use. That is a very reasonable thing to try to do. If their implementation is successful and useful in the market is a whole different issue, of course.
  Transmeta was successful from an engineering perspective; their products used less power than their competitors. Problem was, they were only a few months ahead, and required too many changes in devices. All other companies had to do was be richer, and more able to secure access to new fab technologies.
  One big difference here is that this will potentially change thread management for programmers in a way that many people will like. It might very well be able to fragment the industry and corner a significant chunk of interest.
So, its like x86 already is? by BitZtream · 2016-02-19 12:13 · Score: 4, Interesting

So instead of the current situation where we have intel/amd processors doing something under the hood, using microcode as the language that translate the x86 environment into whatever is actually on the silicon ... and you're going to add ARM to it, and maybe some other ones?
Thats cool and all, but its not really all that useful, and intel can pretty much already do that on any CPU it wants with a microcode update. ARM may not run as efficiently on the core that intel uses, but it can be done from a technical point of view.
Its not worth it. Thats why no one does it.
You'll effectively do nothing well.
Intel was an ARM licensee (probably still is), they know ARM as good as anyone outside of ARM itself ... and they made entirely new silicon to run it (well technically they bought it if I recall correctly) ... and it even had its own microcode ... But what they never did was share a single core between both ARM and x86 CPUs that could change modes with a microcode update. No reason they couldn't other than its not efficient.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
1. Re:So, its like x86 already is? by Anonymous Coward · 2016-02-19 15:24 · Score: 0
  
  No reason they couldn't other than its not efficient.
  Based on the facts you presented, it sounds to me like Intel chose not to do it because their lawyers told them not to.
2. Re:So, its like x86 already is? by KiloByte · 2016-02-19 16:47 · Score: 2
  
  Too bad both Intel and AMD keep their microcode closed. There's so much fun we could do if they were documented and non-Tivoized.
  Just the first use: shuffle opcode numbers, make your compiler emit those and recompile your software per-installation. Any exploits that use machine code are instantly thwarted.
  And that's just a start...
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
3. Re:So, its like x86 already is? by Dwedit · 2016-02-19 17:29 · Score: 1
  
  So that wouldn't stop pure return-oriented-programming, or if anyone knew that you were doing something like that, exploits that can read the code memory.
4. Re:So, its like x86 already is? by Anonymous Coward · 2016-02-19 23:06 · Score: 0
  
  Thats cool and all, but its not really all that useful
  Are you sure about the uselessness?
  It sounds to me like the holy grail of portability. Goodbye to bytecode, just run it in compiled form and switch the instruction set depending on process.
5. Re:So, its like x86 already is? by KiloByte · 2016-02-20 00:44 · Score: 1
  
  If we're already recompiling per-install, it would be easy to randomize a lot about the code, making return-oriented-programming moot (or at least massively harder if you have too little entropy). Shell/perl/etc can be "compiled" into a scrambled form. We can randomize kernel syscall numbers even today. But all of that is worth comparatively little if the biggest risk, machine code, is easily exploited.
  If you can read code memory then this technique can be defeated -- but needing to have two separate exploits for the same hack raises the difficulty a great deal.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
6. Re:So, its like x86 already is? by drinkypoo · 2016-02-20 07:21 · Score: 1
  
  Intel was an ARM licensee (probably still is),
  Naw, they sold off XScale and the license presumably went with it
  
  they know ARM as good as anyone outside of ARM itself
  Naw, they knew how to make the ARM of the day fast, but not power-efficient. Everyone else's ARM sipped power per MIPS compared to XScale under Intel. Have not followed it under Marvell so I don't know if it ever turned out, but they still make it so it probably did.
  
  But what they never did was share a single core between both ARM and x86 CPUs that could change modes with a microcode update. No reason they couldn't other than its not efficient.
  it's just a waste of time. the demand is not there. why mess around with it?
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Yea I'll believe it when I see it... by Anonymous Coward · 2016-02-19 12:17 · Score: 0

I remember studying about variable length instruction sets when I was studying CS back in the '70s. Scared the hell out of me. I'm glad I'll never see it in my lifetime...
1. Re:Yea I'll believe it when I see it... by stevew · 2016-02-19 12:23 · Score: 1
  
  Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
  
  --
  Have you compiled your kernel today??
2. Re:Yea I'll believe it when I see it... by Guy+Harris · 2016-02-19 12:39 · Score: 1
  
  I remember studying about variable length instruction sets when I was studying CS back in the '70s. Scared the hell out of me.
  "Variable" or "variable length"? There's nothing particularly exotic about instruction sets where not all instructions are the same length, so presumably you meant the former.
3. Re:Yea I'll believe it when I see it... by Guy+Harris · 2016-02-19 12:50 · Score: 1
  
  Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
  But they didn't translate programs compiled to the various S-languages directly into microcode and execute the microcode. From this article about them, it sounds as if that's what they're doing:
  
  The next issue on the list was the ISA which moves from a 32-bit one on the prototype to a full 64-bit version in the Shasta/Mojave pair. SM can run what it calls personalities in software but they are not implemented in the expected way. Personalities are software and are loaded at boot time, but they are both light and low-level. They don’t emulate code, they just translate it to the native ISA, a 32-bit add is a 32-bit add on both native and emulated hardware, but probably have differing opcodes. Occasionally this software will need to do something more complex but the bulk of the work is basically a big lookup table.
  Personalities are not purely software though, there are hardware hooks to assist in with the job, unfortunately SM did not go into more detail here. One thing they did say is that the code is not user accessible and runs underneath everything including a hypervisor where applicable. In x86 terms, think of this as ring -2 or something similar. To running code and users, everything should appear to be native hardware, assuming it all works as promised.
  so this looks more like Transmeta.
4. Re:Yea I'll believe it when I see it... by Dunbal · 2016-02-19 13:02 · Score: 1
  
  translate programs compiled to the various S-languages directly into microcode and execute the microcode.
  What could possibly go wrong, in today's connected era. LOL.
  
  --
  Seven puppies were harmed during the making of this post.
5. Re:Yea I'll believe it when I see it... by Guy+Harris · 2016-02-19 13:24 · Score: 2
  
  translate programs compiled to the various S-languages directly into microcode and execute the microcode.
  What could possibly go wrong, in today's connected era. LOL.
  That's pretty much what these guys do, although they've stopped calling what it gets translated to is "microcode" (in the current machines, it's Power Architecture code, possibly with a few extensions such as tag bits).
  (They used to call the low-level OS and binary-to-binary translator code "vertical microcode", in the days before it was PowerPC/Power Architecture code, but that was for legal reasons; they didn't want to be forced to make the code available to clone makers. "Vertical microcode" ran out of main memory, and the binary-to-binary translator translated compiler output to the "vertical microcode" instruction set and ran it from main memory as well. The original "vertical microcode" instruction set was somewhat S/360-ish.)
6. Re:Yea I'll believe it when I see it... by stevew · 2016-02-19 14:37 · Score: 1
  
  Well - variable length instructions, variable length data path too. The S-ops were Huffman encoded, i.e. the most often used were the shortest. The B1700 had a about a 3:1 code density advantage over the IBM 360 in Cobol if my memory serves me. Yes - likely Transmeta's JIT compiler is closer to what this is about. I also worked on the Cydra 5 which was a VLIW machine - it's Achilles heal was solved by the JIT trick.
  
  --
  Have you compiled your kernel today??
7. Re:Yea I'll believe it when I see it... by cold+fjord · 2016-02-19 15:38 · Score: 1
  
  Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
  So did the i432 a few year later.
  Intel iAPX 432
  
  The innovative features of the iAPX 432 were individually detrimental to good performance. Combined together, it ran slower than contemporary conventional microprocessor designs such as the Motorola 68010 and Intel 80286. One problem was that the two-chip implementation of the GDP limited it to the speed of the motherboard's electrical wiring. A larger issue was the capability architecture needed large associative caches to run efficiently, but the chips had no room left for that. The instruction set also used bit-aligned variable-length instructions (as opposed to the byte or word-aligned semi-fixed formats used in the majority of computer designs). Instruction decoding was much more complex than in other designs. In addition, the BIU was designed to support fault-tolerant systems, and in doing so up to 40% of the bus time was held up in wait states.
  
  --
  much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
8. Re:Yea I'll believe it when I see it... by Anonymous Coward · 2016-02-19 23:13 · Score: 0
  
  On the other hand, if you compare more recent architectures there is a certain benefit to fixed length instructions.
  In ARM Cortex M0 they have even gone so far as to remove alot of the immediate data loading functionality if the data doesn't fit into the standard instruction length.
  This means that the instruction pipeline doesn't have to take possible data into consideration and can treat everything as instructions. For 16 or 32bit loads the instruction will instead hold an offset to the end of the function where a data table is placed.
  The number of memory accesses will be the same but the CPU will work more efficiently.
  Of course this doesn't matter on a CPU without pipeline or separate data/instruction caches.
  Code density was way more important back when memory was expensive. It still matters om microcontrollers, but not as much as it used to.
9. Re:Yea I'll believe it when I see it... by TheRealHocusLocus · 2016-02-20 02:18 · Score: 1
  
  Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
  So from within the dark entrails of Burroughs you were one of the Hardy Boys helping bring to light the Soul of a New Machine? That's awesome, and I'm not Kiddering. What if... those BCD architectures were designed from scratch using today's tools and methods? Is there something the machine could do inherently better or faster, something that benefits from the use of errorless unbounded decimal arithmetic?
  
  --
  <blink>down the rabbit hole</blink>
that quote is wrong by turkeydance · 2016-02-19 12:23 · Score: 1

nikto is the last word. as written, it's a jelly doughnut.
1. Re:that quote is wrong by sexconker · 2016-02-19 12:35 · Score: 1
  
  I scrolled down and saw what you're referring to:
  
  "Gort, klaatu nikto barada." -- The Day the Earth Stood Still
  Slashdot, turn in your stash of fake nerd cards. You're not even at poser level anymore.
2. Re:that quote is wrong by Guy+Harris · 2016-02-19 12:56 · Score: 1
  
  nikto is the last word. as written, it's a jelly doughnut.
  "Gort, I am a jelly doughnut"?
3. Re:that quote is wrong by ClickOnThis · 2016-02-19 14:20 · Score: 1
  
  "Gort, Klaatu needs a jelly doughnut to be brought back to life. Oh, and please don't destroy the earth."
  
  --
  If it weren't for deadlines, nothing would be late.
WRONG! by Anonymous Coward · 2016-02-19 12:27 · Score: 0

Went RISC uOps to Get Perf. Anyone who thinks a PENTIUM was looking for Low Power Is AN IGNORANT SLUT!
1. Re:WRONG! by Guy+Harris · 2016-02-19 13:05 · Score: 2
  
  Went RISC uOps to Get Perf. Anyone who thinks a PENTIUM was looking for Low Power Is AN IGNORANT SLUT!
  (Pentium Pro, but whatever.)
  No, but Transmeta did something similar to what it appears Soft Machine are doing, and did so to reduce power consumption; I think that's what "The basic concept of translating established instructions to something more efficient for the specific architecture isn't new; this is what yielded the first low-power x86 processors at the beginning of the century." was referring to.
So.... by wbr1 · 2016-02-19 12:31 · Score: 2

Transmeta with a hardware morphing layer?

--
Silence is a state of mime.
1. Re:So.... by Guy+Harris · 2016-02-19 13:03 · Score: 1
  
  Transmeta with a hardware morphing layer?
  Maybe, maybe not. An article about them on SemiAccurate says "SM can run what it calls personalities in software but they are not implemented in the expected way. Personalities are software and are loaded at boot time, but they are both light and low-level. They don’t emulate code, they just translate it to the native ISA, a 32-bit add is a 32-bit add on both native and emulated hardware, but probably have differing opcodes." and "Personalities are not purely software though, there are hardware hooks to assist in with the job, unfortunately SM did not go into more detail here."
  Perhaps there's more hardware assistance than in Transmeta's chips, where I think the instruction set was somewhat oriented towards emulating an x86 and some hardware features helped, but the translation itself was done in software - maybe there's some hardware assistance in the translation process, although that might bias it towards particular instruction sets if done naively.
2. Re:So.... by Pulzar · 2016-02-19 13:39 · Score: 1
  
  a 32-bit add is a 32-bit add on both native and emulated hardware, but probably have differing opcodes
  That breaks down very quickly when you get to any memory operations, as well as all the various flavours of SIMDs...
  It really doesn't make much sense that you can be more power efficient in your implementation of the behaviour and ordering of an exclusive store-release transaction using generic ops compared to hardware that was explicitly built and optimized for that instruction.
  Yeah, maybe your integer and floating point units are better optimized (no real reason why they would be, though), but that's not where most of the power and performance comes from. Not in a CPU, anyway.
  
  --
  Never underestimate the bandwidth of a 747 filled with CD-ROMs.
3. Re:So.... by Rockoon · 2016-02-20 00:57 · Score: 1
  
  a 32-bit add is a 32-bit add on both native and emulated hardware
  Hate to tell you this, but no...
  
  On x86 a 32-bit add also updates a flags register that is commonly leveraged. A full emulation of this register would be quite expensive on architectures that dont automatically track all of the same things.
  
  --
  "His name was James Damore."
soft machine by Anonymous Coward · 2016-02-19 12:31 · Score: 0

http://www.amazon.com/The-Soft-Machine-William-Burroughs/dp/0802133290
1. Re:soft machine by Guy+Harris · 2016-02-19 12:51 · Score: 1
  
  http://www.amazon.com/The-Soft-Machine-William-Burroughs/dp/0802133290
  Or Soft Machine, the band.
2. Re:soft machine by Anonymous Coward · 2016-02-19 22:09 · Score: 0
  
  named after the book ...
What's old is old again... by slew · 2016-02-19 12:31 · Score: 1

Translating instruction to micro-ops to run on a VLIW-ish backend? I think every high performance architecture does that now (arm and x86)
Share processing resources between cores? AMD tried to share the FP pipeline (flex FP?) between cores starting with their bulldozer architecture, but it looks like they are going to abandon that with their zen architecture after getting beat up about single thread perf...
1. Re:What's old is old again... by Guy+Harris · 2016-02-19 12:54 · Score: 1
  
  Translating instruction to micro-ops to run on a VLIW-ish backend? I think every high performance architecture does that now (arm and x86)
  x86 - and z/Architecture, with the z13 chip - but do any ARM processors (or other RISC processors) do that?
2. Re:What's old is old again... by Anonymous Coward · 2016-02-19 18:57 · Score: 0
  
  Many of you might already know this, but the problem with sharing the FPU was purely that they didn't have as many resources as Intel's chips. If the FPU had been twice as powerful as a normal FPU, sharing it would have been a net gain, but it wasn't. For more clarity, they thought they could spend the transistors more effectively to increase the possible clock speed, but in the end were left with a very narrow processor that was barely higher clockspeed than Intel's chips.
  A single core that was eight times wider and capable of servicing eight threads simultaneously would slaughter an eight core in every workload, ceteris paribus. The problem is that doing so becomes ridiculously complicated (many things are O(N^2) in the width per core, or even worse, whereas most things in going more cores are O(N), and the areas that are worse have already gone through the extra design work), making the engineering cost far more than Intel is willing to spend, and AMD is capable of spending (AMD even had to switch to a much less efficient automatic design per core for cost reasons, part of why their transistor counts bloated so much without attendant gains). Multicore is simply a hack to reduce the engineering workload while making a very wide processor. This is another hack trying to do exactly the same thing.
  I don't think this will work, but it is definitely an area worth researching, and it could conceivably become an important product in the history of processor design even if it does fail.
3. Re:What's old is old again... by Blaskowicz · 2016-02-19 20:09 · Score: 1
  
  Look up nvidia Denver, it is a wide CPU that is said to do that sort of things.
Uh... no by mark-t · 2016-02-19 12:41 · Score: 1

The 6502, for example, makes for a beautiful stack machine...
As the 6502 only had a single stack, limited in size to 256 bytes, and hard coded to reside at memory address range 0x0100-0x01ff, I might tend to disagree with that assessment.

--
File under 'M' for 'Manic ranting'
1. Re:Uh... no by jeremyp · 2016-02-19 14:43 · Score: 2
  
  You forget that the 6502 could do indirect addressing through any of the zero page locations giving you a potential 128 stack pointers for your stack machine. Also, the zero page had a special address mode so that loads, stores and increments/decrements could be done with two byte instructions instead of three byte instructions, reducing the fetch execute time by one cycle.
  However, I think the main reason stack machines were often implemented on 6502 has more to do with its relative lack of registers. There's only one register on which you can do arithmetic which means that you have to constantly save data out of it when calculating complex expressions. The stack machine architecture makes this a fairly task.
  
  --
  All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
2. Re:Uh... no by mark-t · 2016-02-20 04:30 · Score: 1
  
  While zp access takes fewer bytes, and would indeed reduce the fetch time, the time it took to execute indirect addressing instructions on the 6502 was 5 to 6 cycles, and were among the instructions that take the longest time to execute (the only instruction taking more time was the 7-cycle ASL instruction).
  
  --
  File under 'M' for 'Manic ranting'
Well designs are all well and good by Anonymous Coward · 2016-02-19 12:50 · Score: 0

But do they have silicon, and pilot hardware?
as in dynamic binary translation? by Anonymous Coward · 2016-02-19 13:50 · Score: 0

Equals: Binary Translation: lots of history in industry (I'm skipping research projects):
DEC: vax, mips, sparc, x86 inputs, target of Alpha. Personally Involved with all of these
Transmeta: x86 to host (Crusoe)
NVidia: project Denver
AMD: Dynopt (x86 dynamic binary optimization)
I've got an idea... by Anonymous Coward · 2016-02-19 14:13 · Score: 1

Make a cpu with just a few instructions and do complex stuff by repeating simple things many times, fast....oh hang on...
I can only wonder... by tlambert · 2016-02-19 14:19 · Score: 1

I can only wonder... if the Crusoe and Efficeon patents are being licensed from Intellectual Ventures (who ended up owning them), or if we are going to see another East Texas lawsuit over this.
Not that great by gweihir · 2016-02-19 14:57 · Score: 1

There is a reason the idea fizzled: If you have very special code, it may be able to compete speed-wise, otherwise it will be slower. As compilers optimize better these days, it will be even worse today. And the "low power" is a red herring: If you want that (at slow speed), compile to ARM code, not to x86.
My guess is somebody is looking for funding from clueless people.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re: Old is new by chaboud · 2016-02-19 15:20 · Score: 4, Funny

Wow dude... that's like, super meta... I mean, it's basically transmeta...
Get it? Is this thing on?
ReTransMeta by softcoder · 2016-02-19 15:31 · Score: 1

Yes and Transmeta were not particularly successful in spite of the promise the technology showed.
1. Re:ReTransMeta by dbIII · 2016-02-20 04:31 · Score: 1
  
  Stuff happened around that time that made a very large number of companies not particularly successful and investment dried up.
  The companies that survived were the ones with money in the bank and not necessarily the ones with the best ideas or products.
  It also doesn't help much when your customers and potential customers either go bust or stop spending money.
Transmeta? by istartedi · 2016-02-19 20:59 · Score: 3, Informative

This sounds like Transmeta. Remember that, Slashdot old-timers? The company had trouble, and was eventually bought by private equity. I'm too lazy to find out if this is a re-emergence by the rights holders, or if they're going to get sued by the guys who bought Transmeta's IP. IIRC, It was an Israeli company that took it off the US exchange. After that I lost track of it.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Re:Why? This is as bad as Firefox by Anne+Thwacks · 2016-02-19 22:57 · Score: 1

I think the dead BSDs will soon be making references to "on the third day, he rose again". I expect Netcraft will confirm in on Easter Day!
Or maybe Netflix. It has the makings of a great Zombie movie - with Orson Welles as RMS, Vincent Price as Theo, Peter Cushing and Christopher Lee and Kernighan and Richie, Lon Chaney as SCO and Special appearance of Boris Karloff as Bill Gates.
I am willing to write the screen play (in K&R C) for a considerable fee. Someone else will have to do the CGI.

--
Sent from my ASR33 using ASCII