Transmeta Awarded Another Patent

Abstract (very) for the lazy folks by Wah · 1999-09-28 04:15 · Score: 3

Apparatus for use in a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor including circuitry for temporarily storing memory stores generated until a determination that a sequence of translated instructions will execute without exception or error on the host processor, circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host
processor, and circuitry for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.

hmmm???

--
+&x

A few other things by Stradivarius · 1999-09-28 12:43 · Score: 5

OK, these are just a few other bits of interest I picked out of the patent:

In a preferred embodiment of the invention, the morph host is a very long instruction word (VLIW) processor which is designed with a plurality of processing channels.

I'm not going to go into huge detail about VLIW machines (particularly since I don't know all that much about them :-). Suffice it to say that traditional VLIW CPUs fetch multiple instructions at once, and rely on the compiler to ensure that there are no dependencies between instructions in a fetch group (if the compiler can't find x number of independents, it will pad the holes with non-operations, or NOPs). Looking at Transmeta's patent, it appears that rather than a compiler doing this checking, their code-translation software will be doing it on the fly. RISC/CISC machines, on the other hand, typically do this checking in hardware. But Transmeta's reasoning seems to be that doing it in hardware adds complexity, hence lower clock rates, and also doesn't make multiple instruction sets very feasible.

Regarding the instruction translation and subsequent caching I mentioned in my previous post, a quote from the patent illuminates the matter a little more:

The code morphing software of the microprocessor...includes a translator portion which decodes the instructions of the target application, converts those target instructions to the primitive host instructions capable of execution by the morph host, optimizes the operations required by the target instructions, reorders and schedules the primitive instructions into VLIW instructions (a translation) for the morph host, and executes the host VLIW instructions.

When the particular target instruction sequence is next encountered in running the application, the host translation will then be found in the translation buffer and immediately executed without the necessity of translating, optimizing, reordering, or rescheduling. Using the advanced techniques described below, it has been estimated that the translation for a target instruction (once completely translated) will be found in the translation buffer all but once for each one million or so executions of the translation. Consequently, after a first translation, all of the steps required for translation such as decoding, fetching primitive instructions, optimizing the primitive instructions, rescheduling into a host translation, and storing in the translation buffer may be eliminated from the processing required. Since the processor for which the target instructions were written must decode, fetch, reorder, and reschedule each instruction each time the instruction is executed, this drastically reduces the work required for executing the target instructions and increases the speed of the microprocessor of the present invention.

Transmeta seems to have an excellent idea here. They're caching optimized translations of the incoming instructions, so rather than have to translate and optimize over and over each time you see that bit of code, you do it once and then just grab it from the cache. Due to the spatial and temporal locality of programs (ie the fact that your accesses to instructions are not random, but are localized in loops, etc), this cache ("translation buffer") will only fail to have a translation present once every million instructions. So you're doing *one* translation every million cycles, rather than a million translations like current processors would have to do. Interestingly enough, a scheme like this was brought up as a discussion item in my Superscalar Processor Design class a couple of weeks ago, though my professor used the example of an specialized Alpha decoding/translating x86 and caching the results. One might even write the translations back out to disk as an attachment to the original executable, so that the next time you run the program that's fewer translations you have to do, and eventually you'll have a fully translated version on your hard disk for optimal speed. I guess we'll just have to wait to see if Transmeta does something similar.

One embodiment of the enhanced hardware includes sixty-four working registers in the integer unit and thirty-two working registers in the floating point unit. The embodiment also includes an enhanced set of target registers which include all of the frequently changed registers of the target processor necessary to provide the state of that processor; these include condition control registers and other registers necessary for control of the simulated system.

It seems this new chip is going to have a lot of registers. As Cartman would say, sweeeeeet!

The patent also provides some sample C code, the corresponding x86 assembly, and some sample optimizations the Transmeta system may perform. It's a little more than half way down the page, if you want to look, just scroll until you see code :-)

Re:What it Really does by Sloppy · 1999-09-28 05:34 · Score: 3

Can't think of a situation (except for processor bugs like the F00F one), where the processor hangs in mid of some instruction, stumbled over some microcode gone crazy. So I simply see no benefit of a rollback of an instruction, sorry.

Happens all the time, although there's already ways of dealing with it. Consider virtual memory. Having to redo an instruction, because some exception occurred in the middle of it, isn't very .. um .. exceptional.

But I can't think of how this relates to the Transmeta speculations. Well, actually I can, but my theory is so wild-ass that everyone would laugh at me.

Oh, what the hell. This is as good a place as any for me to make a complete fool of myself... I think Transmeta is making a display circuit that instead of fetching each pixel from a frame buffer, executes a little program for each pixel. The program must execute incredibly fast since the result must be available before the horizontal scan goes to the next pixel.

There, I said it. Now everyone can back away from me quietly, and then point and laugh when they reach a safe distance.

---
Have a Sloppy day!

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Further clarification by Stradivarius · 1999-09-28 05:35 · Score: 5

Some notes for those who may want a more in-detail explanation:

The beginning of the patent ("claims") is essentially just a list of things that all modern, superscalar, out-of-order processors do, and saying "hey we do this too".

Basically, out-of-order machines execute instructions out of their program order (hence the name :). This means that if your code sequence is A,B,C; the CPU may actually execute it such that B is done executing before A. But B's results cannot be written to system memory or the architected registers ("machine state")until you know that instruction A didn't generate an exception. That's so that you can provide precise exception handling, ie that the OS can service A's exception and then resume exection with B. If you don't wait to do your memory store, then you'll end up executing B twice, which you didn't intend. So that's what all the talk in the beginning of the patent about memory stores, etc, is about.

If you get past all the uninteresting stuff like that in the beginning, you'll find the following:

"The present invention overcomes the problems of the prior art and provides a microprocessor which is faster than microprocessors of the prior art, is capable of running all of the software for all of the operating systems which may be run by a large number of families of prior art microprocessors, yet is less expensive than prior art microprocessors. "

The idea it seems is that rather than making complex hardware to execute the instructions and perform speed enhancements, they're doing speed optimizations in software. Which in turn allows very simple hardware(which in turn should translate to really high clock speeds). It seems that Transmeta's bet with this is that the penalty incurred by doing software rather than hardware optimizations is offset by the increase in clock speed and decrease in hardware cost.

Using such an approach should also make running multiple instruction sets a much easier task. Currently processors do their instruction decoding in hardware. But if Transmeta has managed to do this decoding (fast) in software, then they can just add a little more software to allow multiple instruction sets. They also seem to be caching the translations of non-native to native instructions in a memory structure of some sort, so that they minimize the redundant emulation computations.

Actually, to address gupg's comment, it also seems that they should not need *any* special compiler support, because they can run stuff that was compiled for any of the various instruction sets they choose to support. So they themselves should not need to do compiler work. I would guess that the reason they're hiring all sorts of compiler folks is that they need people to do the afore-mentioned software instruction translation, and the people best suited for that are compiler people since they work on the instruction level all the time. Most other programmers don't have to deal with anything other than high-level languages, and so would not be particularly well suited to doing what Transmeta is doing.

Anyway, hopefully this explained things a bit more to everyone. My reading and explanation of the patent was pretty quick since I have to go to class in a few minutes. I'll finish reading the patent afterwards and add anything else I think you might like to know.

Cheers,
Stradivarius

My layman's explanation by sporkboy · 1999-09-28 04:16 · Score: 5

It appears that the flow will be like this.

1. Set of instructions comes into processor in one instruction set (like x86).

2. This device stores the data for this series of instructions temporarily

3. The device translates the (x86) instructions into its own internal instruction set and figures out an ordering that will not cause it to have exceptions.

4. The device retrieves the temporary data and "fills in the blanks" in the "inner" processor to get results, the so called "permanent storage" is probably the inner processor's instruction cache.

5. The data is cleared from the interim area once it's acted upon.

It means .. by gupg · 1999-09-28 04:17 · Score: 5

I think it means (from the abstract) that they are going to provide compatability to other processors by converting their instructions to their host processor. So, the story unfolds. Obviously, they have a super fast processor and will provide for running Intel etc instructions on their processors.

The patent itself is more concerned with making sure that the conversion process occurs without any exceptions taking place .. or actually holding the processor state and waiting for a sequence of instructions to make sure no exception etc happens and then excuting it on the host processor.

They obviously also need strong compiler support for such a processor which explains all the software and compiler people they have been recruiting.

Fun, fun, fun .. who says Computer architecture is dead !

Sumit

It means by FooBarSmith · 1999-09-28 04:18 · Score: 4

Either the people at Transmeta really need to take a course in basic english and especially punctuation or that they have a random patent generator that strings together random combinations of processor, executed, processing, circuits, determination and stores.

My money is on the latter, maybe Linus whipped together a Perl script in his lunch hour?

--
stty erase ^H

It helps if you run it through babelfish... by neuroid · 1999-09-28 04:20 · Score: 4

instrument for US one processing system t a capable processing host executing a first instruction ajust ajud functioning instruction a different instruction ajust that est translates first instruction ajust processing host including provisory circuit for storing memory armazen to ger until a determination that a sequence translates instruction execut without exception or error processing host, permanent circuit for storing provisory memory armazen stored when a determination est faç that a sequence translates instruction execut without exception or error processing host, and circuit for eliminating provisory memory armazen stored when a determination est faç that a sequence translates instruction to ger an exception error in the processor.

That's from english->portugese->english

What it Really does by Coventry · 1999-09-28 04:23 · Score: 5

Ok, its for emulation, but it Doesnt Just speed emulation. This allows for instruction ROLLBACK. Want a journeling filesystem? How about a journeling processor?
The patent is for a co-processing unit that not Only translates an foreign instruction set into native instructions for a 'target processor', But, acts as a go-between for that target processor and memory. It stores the processor state, and buffers any memory writes, until it is certain that a group of instructions has been run without exception or error... If the translated instructions crash, no damage is done. Not only is this amazing overall, but it allows for Very speculative, and Very fast, instruction translation and branch prediction...

--
man is machine

A HERRING! by Skip666Kent · 1999-09-28 04:23 · Score: 4

Those bastards have patented my favorite fish! Of all the nerve!

Really, tho', it could be a Red Herring. Transmeta could be cashing in on the popular assumption that they're going to create a wild new processor that'll be Everything to Everyone in order to disguise the fact that they're really in the process of opening the ULTIMATE multimedia porn sight for cyber-trans-sexuals.

(Not that there's anything wrong with that...)

--
**>>BELCH

Re:Hmm ... by el_diablo · 1999-09-28 04:25 · Score: 3

looks like a cpu which read foreign instruction sets and then translates them into its own set and execs them in a highly parallel manner to produces a faster execution than the original processor. THe trick here looks to be finding out which things can be done parallel without causing an exception. end result is a transmeta chip that runs the instructions of other chips faster.

Re:Veil, indeed. by Psyicide · 1999-09-28 04:25 · Score: 3

Yes. It seems that Transmeta has perfected a run-on sentence processor, able to mutate any reasonable statement into a more obscured but equivalent sentence until comprehension is completely lost by the reader.

Quick Summary by meta4 · 1999-09-28 04:25 · Score: 3

If you had scrolled way down the page, you would have found this:

SUMMARY OF THE INVENTION

It is, therefore, an object of the present invention to provide a host processor with apparatus for enhancing the operation of a microprocessor which is less expensive than conventional state of the art microprocessors yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors.

This and other objects of the present invention are realized by apparatus for use in a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor comprising means for temporarily storing memory stores generated until a determination that a sequence of translated instructions will execute without exception or error on the host processor, means for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor, and means for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.

These and other objects and features of the invention will be better understood by reference to the detailed description which follows taken together with the drawings in which like elements are referred to by like designations throughout the several views.

Transmeta by technos · 1999-09-28 04:26 · Score: 5

It appears to be a system in which a processor is fed a sequence of instructions in a translated foreign set, and the results are held in cache until it can be ascertained that the entire stream of instructions will run without error, at which time the cache is released. They may be using this purely as a CISCRISC mechanism, or they may be planning a platform where the actual program code is 'broken' into chunks, and the processors might encounter exception if the granularity of the sets is off. They may even be planning a platform that does multi-arch emulation on a transparent hardware/microcode level, ala AS/400. Heck, they might be doing all three! They also give an allusion to making a cheap processor run code designed for a more expensive one, so perhaps they're planning to give Intel a run for their money.

I'm sorry, but that is the closest I can get to an answer with the available information.

--
.sig: Now legally binding!

Huh? by scumdamn · 1999-09-28 04:35 · Score: 4

"circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor"

I vote for it being the random patent generator. My favorite part of the whole solliloquy is

permanently storing memory stores temporarily stored

you can't beat that! Maybe they're really working on an optical processor and wanting us to think they're working on a universal processor that'll run any other processor's code. Good one, Linus (and others), but what's it really do

Sure Looks Related To Their Other Patents by Christopher+B.+Brown · 1999-09-28 04:36 · Score: 3

The latest patent surely looks related to the other patents previously awarded...

The basic idea for all of the patents has been to provide mechanisms to allow one to:

Create a new CPU that uses one instruction set;
That CPU is emulating the instruction set of some other CPU ( Oh, Say, Perhaps IA-32... );
The patent provides for some scheme whereby instructions are run in some sort of "emulation mode," where they try to execute in a sort of abeyance...
The system then seeks to detect situations where the emulation starts going astray, and provides mechanisms for "coping with this error."

The various patents have involved that mechanism for coping with the errors, with an attempt to construct ways of quickly working around them.

This parallels the notion of Lagrangian Relaxation, where you take a problem, with various restrictions, and relax those restrictions. In exploring the solution space, the system will find solutions that aren't in the feasible solution space of the (unrelaxed) problem.

In the case of Lagrangian Relaxation, the way of coping with that is to associate values with the objective function that penalize infeasible solutions, thus encouraging the system to head towards feasible solutions.

In the case of Patent 5958061, the "relaxation" is that the system performs the emulated instructions, modifying a temporary memory store, and rolling back when it hits cases where the preliminary emulation results in errors on the host processor.

Patent 5832205 concentrated, in contrast, on the apparatus to detect a failure of speculation.

--
If you're not part of the solution, you're part of the precipitate.

Have we really thought this through yet?? by pgm · 1999-09-28 04:48 · Score: 3

OK, after reading a ton of messages, I'm thinking about this whole instruction translation issue. If Transmeta is making a "co-processor" that would translate instruction sets, and _THIS_ thing can store the existing state of the processor then....

Couldn't they theoretically (siq) be working on a system that would allow you to run MULTIPLE instruction sets inside of single OS?? The implications would turn the existing software industry (of which I am a part) onto it's ear!

Could we actually have a box running some form of unix, and actually be able to run ANY application natively on it - no matter what OS it was written for?? Think about running a BeOS app next to a Win32 app, next to an application compiled for i386 Redhat! WOAH.

If this is even close to what actually exists in Transmeta's labs, then we are in for a serious roller coaster over the next couple of years!

Quivering with anticipation...
p.

Re:Hmm ... by mvw · 1999-09-28 04:50 · Score: 5

looks like a cpu which read foreign instruction sets and then translates them into its own set and execs them in a highly parallel manner to produces a faster execution than the original processor.

TRANSlatingMETAprocessor?

Re:WAKE UP by Guy+Harris · 1999-09-28 04:56 · Score: 4

(if you dont know work out how you can add ppc to a AS400 and not recompile)

Much of the audience may not be familiar with AS/400's, so that's not necessarily much of a hint.

System/38 and AS/400 compilers generate code in a high-level pseudo instruction set; the low-level OS kernel, when told to run one of those programs, translates it into the native instruction set and runs that. (See Frank Soltis' Inside the AS/400; go to the 29th Street Press's home page and select "General Interest" under "*** ALL AS/400 TOPICS ***", and then look for that book, which they claim to have online - the URLs on that site look depressingly dynamically-generated, so I'm loath to make a direct link.)

This let them change the native instruction set from the apparently 360-flavored "IMPI" to an extended PowerPC instruction set without requiring people to recompile programs (unless they tossed out the pseudo instruction set code to save disk space).

From the various Transmeta patents, it sounds as if they're building a chip intended to be used in an environment making use of binary-to-binary translation, as the S/38 and AS/400 do, but it's not at all clear that they intend to use B2B translation in exactly the same fashion - they appear to be targeting existing low-level instruction sets, e.g. x86, rather than some high-level instruction set like the S/38 and AS/400 "MI".

Re:No you're all wrong: it's for Emacs by Imperator · 1999-09-28 05:17 · Score: 5

Actually, it's a way to run any application for any processor and any OS, straight from Emacs. Unrelated planned features for Emacs include improved SMB support, an extremely light-weight httpd, and preliminary support for USB child-rearing devices.

--

Gates' Law: Every 18 months, the speed of software halves.

Slashdot Mirror

Transmeta Awarded Another Patent

20 of 345 comments (clear)