Transmeta Unveils 256-bit Microprocessor Plans
nam37 writes "PCWorld has an article about how Transmeta has outlined its initial plans for a new 256-bit microprocessor dubed the TM8000. They claim it will offer significant advantages over their current TM5x00 line of chips. The processor will be a switch to a 256-bit VLIW (very long instruction word), allowing twice as many instructions in one clock cycle and greater energy efficiency." The article also touches on the popularity Transmeta enjoys in Japan, noting that 92% (CD: corrected from 55%) of the company's revenue comes from there.
92 percent of Transmeta's net revenue came from Japan, a figure which is up from 55 percent in the year earlier.
partly covering the subject, is here.
This is the size of the INSTRUCTION which is encoded, not the datapath.
.12 uM, and shove it out the door. Remember, if you shrink the processor power to 0, everything ELSE still burns alot: screen, drive, I/O, even in an ultrasmall notebook.
Unfortunatly, transmeta is hampered by several factors.
The first is that 256b will require the translator to discover 8 translated instructions (assuming a 32b instruction size) which can be executed in parallel to get good performance. This is a TOUGH barrier, the reality is probably closer to 2-4. Also, the way to get more instructions to issue is through speculation, but too much speculation really hurts power.
Secondly, the transmeta cache for translations and translating code is so small that it hurts quality. Transmeta would do better with OS cooperation, giving a larger hunk of memory to store more and better translations, and to enable more sophisticated translating algorithms. But that breaks the x86 compatability model.
Third, they have lost the battle on performance, and power doesn't matter: Intel can outfab them and if REALLY low power was required/useful in the x86 world, Intel could crush them by simply dusting off the old Pentium core, process shrinking it to
Fourth, transmetas claims in the past have been so full of hot air, so why should we believe anything they say now?
Test your net with Netalyzr
See IBM's research on the VLIW subject.
:)
"We developed an experimental prototype of a VLIW processor, capable of performing multiway branching and conditional execution, which is currently operational. The prototype has helped us investigate some of the hardware constraints in building VLIWs.
This processor executes tree-instructions within a ``classical'' VLIW architecture, that is, fixed-length VLIWs with preassigned slots for the different operations. The register state consists of 64 32-bit general purpose registers, 8 single-bit condition code registers, 4 memory address registers, program status word register, and some special registers. Each Very Long Instruction Word is 759 bits, which include..."
Now, when we know the relationship between IBM and Transmeta, can you combine the results of these two 'projects'.
Unlike an Intel processor, the Transmeta chip is based on a RISC architecture. If you take a look at a CISC processor, like an Intel chip, there is a ton of work that just goes into decoding the instructions. Some instructions are one byte, others are two, some have data imbedded in various bits of the instruction, etc. This makes the decoding and dispatching of instructions quite complex. On a RISC architecture chip, certain bits always indicate the instruction, others are always data. Decoding on these chips is simple.
Now, if you were to double the number of input bits on a CISC processor - you would have to duplicated some fairly complex (read power hungry) circuitry. On a RISC processor, doubling the input bits simply doubles some simple hardware.
Still, that doesn't explain why 2x the bits yields an energy saving... The reason for that is that the concept of doubling the circuitry is a simplified explanation - some of the hardware can be shared. Really, they're just going to be feeding two instructions through in parallel, so for example, you only need to go through one power hungry bus cycle to get the data. You only need to run the dispatch unit once per two instructions, etc.
Much like an automated car wash that uses a bunch of water and electricity. If you changed the design slightly, so that you could run two cars through at once instead of only one you'll use more water and electricity then one car but not as much' as if the two ran through seperately.
It appears Ockham lost his razor and grew a beard.
In fact, I seem to recall that the original VLIW work in the 80s was done on 512 bit and 1024 bit designs, using bit slice components of course.
Large processors need large data and address buses, which means a lot of power hungry transistors on the periphery of the chip, as well as the longer array of the various bus gates inside the chip. The technical challenges in doubling the bus width are enormous.
In fact, a major feature of the Transmeta design is the way the internal compiler reviews code, rearranges it and caches the streamlined code for repeat execution. It means that, just like JIT compilation in Java, the first time through a loop is slower than subsequent accesses. The wider the instruction word, the greater the opportunity for this kind of rescheduling, but also the more cache memory is needed and the more the initial performance hit. Great for playing DVDs or database searches, not so good for office work.
Panurge has posted for the last time. Thanks for the positive moderations.
Ah, you do not understand the concept of Very Long Instruction Word. Internally the chip's ALU's may be 32- or 64-bit, but with VLIW several instructions (whose results do not depend on each other) go in and are executed in parallel. That's a simplistic explanation. Here is stuff about the Transmeta chips and many other innovative and non-conventional designs. Look at IA-64 and Sun MAJC on the same page.
Stick Men
Transmeta Crusoe TM5400/TM5600/TM5800 5.25-inch SBC
c fm
http://www.ibase-i.com.tw/ib755.htm
They've got more Transmeta motherboards, including a CPU PCI board.
I bought the first one that came out and I like it. You'll have to find a way to mount it to an ATX case since it's one third the size.
Other Transmeta Products:
http://www.transmetazone.com/products.
Transmeta chips are VLIW and therefore the bit width they are referring to is not width of the data bus, but the number of instructions that can be executed simultaneously. At present the transmeta chips are 128-bit (four 32-bit instructions), and the new ones will be 256-bit (eight).
Since transmeta chips are VLIW, they do not have to schedule instructions, and do not have to determine (at run time) which instructions can be executed in parallel. With VLIW, both of those functions are performed by _software_, statically, all at once. A singificant amount of the complexity of a cpu is dedicated to performing these functions, which are offloaded to software by transmeta in their "code morph" phase.
Furthermore, the conversion from outdated x86 microOps occurs in software during the "code morph" phase, further offloading functionality that otherwise would exist in silicon.
For these reasons, the transmeta CPU is dramatically simpler than comparable x86 cpus. Unfortunately, it did not perform as anticipated. However, since the die size is so small and the cpu so simple, it does offer some advantages (low power consumption, low heat dissipation).
I have a Toshiba Libretto with a 800Mhz Crusoe chip in it and love it. You can actually run the thing for a few hours. Every other notebook has always said 2.x hrs but usually runs out in around 90 minutes.
But the best thing is the low amount of heat that the thing kicks out. Anyone who has ever sat with a P3/4 notebook on their lap for any amount of time knows how hot they get. These get a little warm after an hour or so, but not hot.
Bought mine in Japan, not sure what is available elsewhere.
Cheers.
Since you're the first person I've read tonight that is confused AND honest about being confused, I'm happy to take a stab at answering some of your questions. I am not a Crusoe expert, and my field isn't microprocessors. Just a warning.
1) What's a "true" 1024 bit processor?
You have to make assumptions to answer this question. Probably the most useful "bit"ness to know for a particular processor is the number of bits it can use for a "normal" memory address. For Athlons, that is 32 bits, and the same for the Intel P4. Some Intel chips have a 4 bit extension, but it's a pain to use and should be ignored (and mostly is). There are a handful of mass produced cpus with 64 bit addressing; the DEC^H^H^HCompaq^H^H^HIntel Alpha, some version of the Sparc lineup, and certain varieties of IBM's POWER family come to mind. Since memory addresses on typical cpus refers to one byte, having 32 bit addresses allows you to uniquely reference 2^32 (~= 4 billion) bytes with a single memory address. How much of that "address space" you can map to physical ram is an entirely different issue. Being "64 bit" typically also means you can represent every integer between 0 and 2^64-1 exactly.
In my experience (I do scientific computing, not enterprise stuff), the ability to address tons of ram from a single cpu is what really counts 99.99% of the time. We have a machine, a Compaq ES40 Model II, with 1 cpu and 14GB of ram. It can grow to 32GB of ram -- and the new version goes up to 64GB of ram (and the machine's a steal at $20K with educational discount -- I'm being serious, but things will change with AMD's 64bit x86 "Hammer" stuff at the end of this year). You can't do that in any sensible way on a 32 bit cpu.
2) From what I understand from the other posts, this transmeta proc is not 256 bits in the same sense that Intel's current chips are 32 bits
True. The "instruction word" on most modern (RISC) cpus == "word" size == integer size == memory address size. In fact, this was one of the big simplifications propounded in the RISC paradigm. Note that modern x86 cpus are RISC based, even though their instruction set is CISC (you can look up CISC and RISC and the web; note that CISC was the right thing to do under certain conditions). The Transmeta Crusoe is *not* a RISC cpu. In some ways it is simpler. However, it requires *very complicated* software support, unlike RISC cpus (take this with a grain of salt). So when someone says that the Crusoe instruction word is 256 bits, you shouldn't make any assumptions about integer or memory address sizes (I don't know, but I assume these are 32 bits on the Crusoe -- 64 bit would be silly for the Crusoe's target applications). A single "instruction" for a Crusoe will (evidently) be 256 bits in the future. However, it will (evidently) be guaranteed that this 256 bits will be broken down into 8 smaller 32 bit instructions by the cpu. That is, 256 bits are fetched from memory (don't ask which memory) at once, which the cpu will interpret as 8 different things to do at the same time.
I'm not mentioning a lot of stuff, like variable width instruction encoding in the x86 instruction set, or how software converts files full of x86 instructions into files full of 256 bit Crusoe instructions, and certain efficiencies and inefficiencies of 64 bit cpus versus 32 bit cpus. My main point is that you shouldn't get hung up on the "bit"ness of a cpu unless you are writing software for that cpu. FWIW, 64 bit cpus is nothing new. I talked to a 70 year-old who claimed to work on experimental 64 bit machines in the 1960s or 70s for the military (I don't recall which military =-).
Since 2^64 is a *really* big number (where are those stupid "number of atoms in the universe" figures when you need them?), it's unlikely that we'll need memory spaces larger than 2^64 anytime soon. Same goes for integer sizes. Improved floating point precision from wider floating point types would be much appreciated by folks like me who are tired of working with crappy 64 bit doubles and can't afford to take the performance hit of wider fp types on 32 bit architectures.
As far as optimal width for instructions, I have no idea. If you want to make a big fat instruction, you better have a lot of good stuff to do at once. And that depends not only on the compiler that converts C (or whatever) into the cpu's instruction set, but also how the human chose to use C (or whatever) to implement her idea.
Computer history is full of people wanting to do something, computers catching up by removing performance bottlenecks, humans adjusting to the new machines, and then the whole thing repeats. Heck, at one time it wasn't clear whether digital computers were really a better idea than analog computers (however, I think this argument is over for general purpose computing), and analog computers don't have any "bits" at all.
Like I said, don't take anything I wrote above (at 5am while waiting for some code to produce output) as fact without double checking somewhere else. If you really want to get your head screwed on right, take an architecture course or (if you're really disciplined) work your way through something like Hennessy and Patterson's "Computer Architecture, A Quantitative Approach". You can get a lot of good info from 'popular' texts like "The Indispensable PC Hardware Book". A big warning about that book, though -- when the author writes "PC", he almost always means "PC when used with MS-DOS or Windows" -- often this is subtle, for instance when discussing the boot process or how memory is organized.
-Paul Komarek
Here, the 256-bits refers to the instruction word, not the data-word size. These are completely different things. If you're going by this, then your x86 could be considered up to a 48-bit machine or so. The TMTA chips are still 32 or 64 or 48 or something like x86 is. this is just going to mean that because it's VLIW, it can do 8 ops per cycle per pipeline stage instead of 4. Cool, but not any more revolutionary than anything else TMTA has done.
This is basically completely wrong.
The Transmeta machine is a VLIW machine, almost the antipathy of CISC. It is closer to what is called "superscalar" machines than anything else.
The idea is that you have a 256 bit INSTRUCTION, not data path. There are several different functional units. Maybe one is a multiplier/divider, another is a floating point unit, another is an address calculator. Maybe you double up each of these resources when you go from 128 to 256 bits. The idea is that each functional unit gets it's own part of the instruction. VLIW stands for Very Large Instruction Word after all - not very large data path!
Next - you need fancy compilers, in this case it's the Transmeta just-in-time compilation that can schedule use of as many as possible of these functional units on a computation thread. Thus as the number of functional units goes up, the potential computation done per clock goes up.
Have you compiled your kernel today??
Cache is huge. Find some closeups of processor cores, and you see that the cache of an average desktop processor covers up to half of the space, maybe even more.
That's not cheap to make, and no doubt power hungry, which is the reverse of what the Crusoe does best. Besides, there's no guarantee more cache will help given it's current design - if you want a smokin' processor with lots of cache, use one that was designed for that purpose.
Opportunity knocks. Karma hunts you down.