Ars Technica Gets Into Crusoe
redmist writes "Ars Technica has a great, in depth article about the new Crusoe chips. Enjoy." This one will answer most of the questions I've heard about Crusoe's guts, and how it differs from other microprocessors. "Must" reading for all hardware junkies!
but you have time to sit on slashdot blabbering about how much time you dont have ha ha ha ha ha ha ha
The cache has a fixed size (though it may be change by the OS) and the code morphing software must do some cache management. If a piece of translated software is not used for some time (corresponding to some function in the app you're running), it may be removed from the cache to make room for a new piece of translation. At a given point in time, only a part of your app is present in the translation cache. So it's not possible to just "save the whole translation". Moreover, some parts of the app, such as initialization stuff that is run only once, might never get translated at all, because it's cheaper to just interpret it.
Anyway, the startup cost of translation is likely lower than the time it would take to retrieve the saved version from disk.
And anyway, low-end chips like the TM3120 are meant to run on machines without disks :-)
I'm sure the guys at Transmeta are just crushed.
In case you didn't get it the first time. _These_ chips have the northbridge built in. _These_ chips and their associated code-morphers are specifically designed for a single-CPU, ultra-low-power, fully-Internet-ready, DVD-playing _laptop_. Part of _this_ design called for (in the engineers minds) a built-in northbridge.
;), and you've now gone from a "Mobile Intel" killer to a Xeon killer.
.1 watts). You install MacOS XIV.
_Future_ Transmeta chips don't have to follow these rules. Making a SMP-capable version is trivial now that they actually have real proof that their designs work. Drop out the northbridge, crank up the MHz, slap these on 2- or 4-CPU riser cards that share a rockin' bus (I'll take a quad machine to start with
Even better, now it's six months down the road, and they come out with the Sparc version of the code-morpher. So you just fire up your web browser, download the new code, call your local Sun dealer, buy a copy of Solaris, flash the new ISA into the CPUs, and install!
But, Sun bites you on the ass, and two months later, after trying MacOS X, and really liking the underlying NeXT architecture, you see on slashdot that the PowerPC ISA is available. Cool!
Now it is three months later. Your 8 CPU machine is starting to seem a little behind the times so you buy the new 32 CPU Transmeta TM431434153 (comes in a nice minitower case with a 300 wat power supply, since by now they've gotten these things down to
But wait! There's another update! Alpha this time, and since by now Compaq has sold Alpha yet again, and it hasn't been made for six months, you can flash yet again, and you turn the old 8-way into a rocking Linux (kernel 4.2, BTW) desktop.
Stupid predictions aside, this is the future of computing, whether some here know it or not.
why do I need to see this here? I read it on the Ars site hours agoo.
/. I read C|Net news quite a bit, but I certainly don't piss and moan when I see story on /. about an article I've already read.
/. that specifically excludes links to Ars. I even have a good name for it: "The Other Slashdot. News for nerds, except for those originating from Ars Technica, because we've already seen it."
What a moronic statement. The same could be said for any of the posts that appear on
Here's a really easy solution: Don't follow the link! Or better yet, create your own variant of
It would be cool if there is an open source clone of the code morphing software so we can morph a PowerPC into a Intel x86 or vice versa... and easily portable to simpler CPUs like the strongArm so we can get fast performance... that shall be a cool project.
Will Linus Torvalds come after open source hackers who clones his software-patented technology? Dare Linus Torvalds take on the Open Source community? Will Linus threaten the community?
Consider what Hannibal wrote in his Crusoe article:
The sequential, x86 application and OS code is fed into the Code Morphing layer, which takes an entire group of x86 instructions at a time and renders a "translation." A translation is a hunk of x86 code that's been translated into native Crusoe VLIW code. This translation only needs to be done once ...
The Code Morphing software watches the translation code to see which pieces of it get used most often. The more a block of code is used the more time the Code Morphing compiler spends aggressively optimizing it, so that that block continues to run faster and faster with each use.
Now compare that to this description of the Java HotSpot Dynamic Compiler from Sun:
The Java HotSpot Performance Engine starts a program by interpreting the bytecodes. As the program runs, a profiler monitors the program to determine the most heavily used portions of the code.
Nearly all applications spend most of their time in only a small portion of their code. The Java HotSpot profiler identifies those parts of an application's code that are most critical for performance. Java HotSpot then compiles and optimizes the performance-critical ``hot spots'' without wasting time compiling seldom-used code. Furthermore, the runtime analyses also enable the compiler to perform native-code optimizations not possible with static compilers.
This seems like a marriage made in heaven!
Peter Robinson
(I've registered as Rodes but I don't have my pw yet.)
$3/month is about $100 over 3 years, which is how long I like to keep my machines.
My point: the price of the electricity is approaching the price of the processor!
Not to mention my desk space ... man a desk full of laptops is looking better than a giant 21" monitor all the time.
I'm excited, aren't you?
Cost is not the real limiting factor in an SMP configuration - it's bandwidth. To fit the requirements of SMP, all processors must have equal access to memory and I/O resources, which makes those systems the real bottleneck. Taking a closer look at memory architechture, you'll notice that memory is currently running between 8-10 ns access times. Processors, which are now pushing 1GHz, will typically have access times below 2 ns. This is obviously a problem in even a uniprocessor system, as would imply we will complete a fetch instruction only once every 4-5 cycles. A bit of a major drawback when you've got multiple execution units sitting there waiting for instructions and data. The solution, which works exceptionally well, is to add cache, both on the processor and between the processor and memory. This does, however, present a problem in an SMP system. Imagine, if you will that processor 1 (P1) fetches a memory location (A) and writes it. Now P1 has write-back cache, which means the modified value of A is written into the cache to be flushed later. Now imagine processor 2 (P2) goes to fetch the same memory location before P1 has flushed its cache. If it were to take it from main memory, it would get the old unmodified value. The way most commodity systems deal with this is to snoop the processor bus; i.e. the processor asserts itself on the bus, broadcasts the memory request to the rest of the processors as well as the memory controller, and, if the memory location is dirty, waits for the other processor to flush its cache line. There are variations on this architecture, such as adding a switch architecture to allow for the actual memory transfers to occur point-to-point in stead of over a shared bus. But the actual broadcast must be done simultaneously to all processors; i.e. it must be atomic. This is not to say that integrating processor cores on the same die doesn't have merit. A lot of the tricks used to ensure cache coherency could be modified in light of the integration - the cache snooping, for example, could be done on chip. Or for that matter you could switch to a more exotic method for cache coherence, such as integrating a dedicated cache directory on die that would record what cache lines each processor core currently has loaded. The disadvantage to this is that you are now designing specialized hardware and adding to complexity of the solution. Also, you have to take into account that any of these approaches increases die size, lowering the viable yield during manufacture. Likewise, the market for such a chip is considerably lower than current commodity chips, thereby raising prices even more. There are also advantages to clustering that cannot be met by SMP and related schemes (like CC-NUMA) - reliability. Any single computer has the disadvantage of being vulnerable to a single component failure, despite the advent of things such as hot-plug PCI and RAID configurations. You are still vulnerable to bad processors and memory, which can be even more dangerous, as they may start corrupting data instead of simply not working. By using discrete independant units, a cluster architecture hopefully minimizes this issue to the point where you're virtually immune to a complete loss of availability. Instead, component failure results only in decreased processing power. For more information on the subject, I would suggest picking up "In Search of Clusters" by Gregory Pfister.
So mr "Expert", how do you estimate 50% CPU cycles go to morphing? Please, enlighten us , oh mighty dopehead.
It probably also requires a lot less effort than using hardware alone to build a fast, modern CPU that is backwards compatable with a ~20-year-old instruction set.
I keep seeing people say they expect/want TM (transmeta) to make a code morphing layer for PPC....um, how are they going to do that? wouldn't they have to liscence the instruction set from apple? and we all know how much apple loves to give away the stuff for there hardware. they said no to the clone makers, I would imagine they would say no to TM, and even if they did give TM the specs so TM could write a code morphing layer to run PPC apps..I bet the total cost of a cruso laptop running PPC would be more then a simularly configured powerbook/ibook, apple wouldn't shoot themselves in the foot and allow a competitor to come out with a cheaper product, there a buissness they like to make money after all. also ars said the northbridge and the SD-DRAM (that right?) moduals were all intergrated on die so wouldn't that mean you need to change all of that stuff if you wanted to run a different arcitecture. after all PPC doesn't run on x86 core logic sets.. and of course the liscening would apply to any chip they wanted to write a code morphing layer from, unless of course the chip had open design specs I suppose anyway if I'm wrong about any of the above assumptions please lemme know, I'd love to have a cheap PPC laptop but I really don't think its likely to happen :(
Anyone remember the Terminator 2 movie?
The chip that made it easier to develop self learning AI?
Well, uh: isn't this basically what Crusoe is doing? I realize they've got a few other features that Merced may not be able to duplicate, but Crusoe should be sufficient to prove that a chip need not be 'big and clunky' to execute x86.
Right, but the question was not "which will run x86 code better - Merced or Crusoe?" It was, what if I just want to bang on the VLIW directly? And my answer was that if you want to bang on the VLIW you should buy a chip that only does the VLIW in hardware (like Crusoe) instead of one that does x86 translation as well (like Merced.) And sense Transmeta doesn't want people banging the VLIW directly, I guess that means you have to buy a Trimedia or something for this purpose.
Is it me, or is software, especially this complex "code morphing" stuff prone to be buggy. Then what, you have to download new codemorphing software (I hate flashing roms). It seems like another layer of complexity that can go wrong.
Your right, of course - just a little bit confused. (Un?)fortunately, nobody is claiming that the Transmeta Crusoe CPU is faster than an Intel Pentium III - because it's NOT faster! Transmeta's own benchmarks show that the Crusoe is not as fast as a Pentium III. What IS being claimed is that the Crusoe comes "close enough" to the performance of a PIII so as to be useful, with only 1/7th the electrical budget. Thats a big deal for battery powered equipment! IE your laptop can run several times longer, etc.
I want to see Crusoe vs StrongARM.
StrongARM has no floating-point. It's also not x86 compatible.
But it might be even less power hungry than Crusoe. Next-gen StrongARM chips are expected this year, and will range from 150 to 600MHz, will consuming between 0.040W and 0.450W (well, that's what Intel has announced).
Ars is correct in saying that all facets of the chip race are about to change. They'd be fools not to push a server/workstation line directly into competition with Intel as soon as they can. They're not fools. The significant thing in my mind is that the folks at Transeta really went and rethought a lot of concepts that everyone knew were good, and they added a new twist to it and created something unique. I often wondered at watching the Alpha market slowly slump when the RISC based chips consistantly turned out blazingly fast benchmark numbers. They were maybe 2 or 3 generations ahead of Intel for awhile, but lost it. Hopefully Transmeta is going to be the company that is finally able to not only introduce a cool technology like this, but also evolve it.
Doh! I should work on my reading comprehension. Yeah - I see your point.
In the process of answering someone else question about the possibility of Crusoe SMP, I had a kinda neat idea for a Crusoe cluster - eg Beowulf. You can't really run the Crusoe in the traditional shared memory SMP sense because the "north bridge" is built into the chip. That is to say, instead of a processor bus coming out of the chip you get a PCI bus and an SDRAM bus. However in there lies a cool possibility for a Beowulf cluster. Beowulf clusters communicate over a network fabric so why not just use the built in PCI bus of the Crusoe as that network fabric? You could put 4 Crusoe chips and a PCI bridge onto a PCI card and there you have 4 nodes of your Beowulf. Now plug a few of those cards into a passive PCI backplane and you have n multiples - 4 cards would be a 16 node Beowulf with just 4 PCI sized cards, a passive backplane, and a power supply. Of course, you would need some stuff like a hard drive to boot your OS - so I guess each card would need an IDE controller or something, still no big deal from the hardware stand point. And a helavu fast network for interconnecting the nodes. Perhaps you could stick a gigabit Ethernet card into a fifth slot on that backplane and that could be your link out to additional 16 node boxes.
Unlikely. A multi-ISA chip would probably 'switch' between ISAs in the same way that a multitasking OS switches between tasks: All the register values for the outgoing task get stored, and new register values for the incoming task get loaded. If something like this is done for a multi-ISA chip, there's no need for register values from both ISAs to coexist in the CPU at the same time.
Or not - the Merced will be big clunky and slow because, just like every previous "recent generation" Intel CPU it will have hardware to translate x86 to the Merced VLIW. A better statement might have been, if you want to code native VLIW, go buy a Trimedia processor - at least if you want an efficient chip that doesn't contain a bunch of x86 baggage.
I'm sure all the EEs are getting a chuckle out of your post. What makes you think hardware is immune to bugs? Remember the Intel F00F bug? Or how about Intel's IA32 floating point fiasco? If anything, the code morphing software can be stored in flash ROM and updated if found buggy. It also means that the hardware can be less complex, ergo less buggy hardware. Sounds like a win-win situation to me.
then why are you reading?
but rather communistic and zealous (in the linux persuasion)!
Code morphing software is stored in ROM (or FlashROM if you want it to be upgradable) and loaded to main memory at boot time. It's NOT stored in some mysterious part of the chip.
If the code morphing software is stored in FlashROM, you may lock a part of it so that it never gets overwritten. That part could then hold enough code to read a new FlashROM image, eg. from a floppy disk. So even if you turn the power off during flashing, you can still use a recover disk.
Seems like sometime in the past I was under some foolish impression that software was a lot more expensive to develop than hardware. I'm just wondering how this fits into this idea of pushing function that used to be in hardware up into software?
Today's high end processors are very complex. The design teams are huge, the development cycle is long. This kind of hardware IS very expensive to develop.
Moreover hardware bugs can be very costly... Just ask Intel about the Pentium division bug...
The answer is - NO! This CPU is clearly not designed for traditional SMP designs. The PCI "north bridge" and the memory controller are built right onto the Crusoe chip. Without access to the CPU main bus, you are kinda locked you out of the traditional shared memory design.
However, you could do a cluster where each CPU has its own memory and runs its own process and the CPUs just communicate over some kind of network. That network might as well be the PCI bus, as it comes "for free" with the Crusoe CPU. You would be limited to a 4 CPU cluster though, unless you invest in a bunch of PCI to PCI bridges.
There's a basic risk here, though: from what I understand, the 'Code Morphing' software doesn't reside in main system memory - instead, it's in a special on-chip memory area, which is loaded from a ROM at boot time. So you replace the ROM with an EEPROM, and make it possible for users to cram a new instruction set in there. What happens if there's a bug in that new instruction set, or the flash process fouls up? Your computer won't boot. It won't even come close to booting - this isn't something you can fix with a bootable floppy, because the code to load the system on the boot floppy won't run any more. Now how do you fix it?
Actually, I suppose there's an obvious solution available: Make it so that the chip can load its 'Code Morphing' layer from either the EEPROM, or a hard-wired ROM, and make it possible to choose between the two with a jumper. If something fouls up, open the case, swap the jumper, reboot, and re-write the EEPROM. Still, this could be a big pain in the ass for people who aren't comfortable rooting around inside their computers.
Moderate this up, I haven't laughed this hard in a long time. And besides, if anyone can use the karma, it looks like fr0g can.
I really hate to knit pick here, but learn your x86 assembly!
:)
code from above:
ADD AX, BX
SUB CX, AX
JNZ Cx
You are a tad bit off in your last line.
a) you either need a 'test' instruction to check of the zero flag is set for jnz to work. I don't see a test or anything modifying the flags
b) jnz jumps to a label. cx is a register, doh! maybe next time
Ever hear of the FDIV bug??? How about F00F?
He's merely transferring his masochistic ways to the electronic realm.
if you read Ars, you may notice they link to slashdot for some of their stories, why not the other way around?
(...) if Crusoe can't run different "morphers" simultaneously (which I suspect it can't).
I suspect it can. If I were working at Transmeta, running two ISAs at the same time would be one of my top goals : if you can run x86 code and Java code at the same speed, you're a winner.
I heard they have been demonstrating Java programs running simultaneously with x86 code at the conference, but could not get much more details.
Excellent thought. Going along your suggestion but pushing it a bit further: What transmeta has done is decouple ISA from the chip. which of course is a point that you already get, but I'll repeat it as motivation for what follows. With decoupling, a new class of engineers will be created. That is ISA designers. there is very few ISAs right now because of the enormous cost of a port of everything that runs on top of it. With the apearance of codemorphing the cost is dramatically reduced to pratically 0. Since the system can be running multiple ISAs at the same time. (With a few flash rom cards.) So this frees up ISA engineers to taylor instruction set for application needs. In fact I see a day in the future where most applications will be written in multiple ISAs. You might be asking why one need multiple ISAs anyways. Well just as different computer language address different problems. different ISAs can be taylored too. If a particular component of a application performs no floating point calc but does a lot of memory manipulation, the ISA for that piece can be optimized for that. And the corresponding code-morphing software will take that into account. Another part of the application might do a lot arithmatics, so the ISA for that part might be optimized for that. So in the future, maybe 30 to 50 years (It takes people a while to adapt to such a break through tech) we will see a suite of virtual ISA that never has and will never be implemented in hardware. These virtual ISAs will be like C/C++, java, eiffle, etc. is today. (Of course by then I hope programming will be fully visual, but that is another story.) email me if you think my ideas are interesting. email me at bineronbrain@netzero.com
in their section on VLIW processors they explicitly mention that some kind of emulation could be used to help the modularity / consistency of VLIW cpu.. that book was written a couple years back.. guess they will include more info on Transmeta now!
So I'm assuming that demo of quake3 they showed WAS running in software mode with some pretty fancy dynamic optimisations going on.
There's no chance the Transmeta CPU would have done the 3D calculations in software. No matter how much dynamic optimisation is applied. You simply need more FPU power to do this than the TM5400 seems to provide.
I mean, we don't have much data on the FPU, but we know it only has 1 such unit. So it can only launch one new FP operation per cycle, which is not enough for this kind of applications.
Compare this with the Sony/Toshiba Emotion Engine (EE) which has 2 additionnal 128-bit SIMD units, each providing 4 single-precision FP operations. (BTW, the EE is not a VLIW, it's a classical 2-way superscalar).
As one of the undoubtedly many /. readers who also reads Ars, why do I need to see this here? I read it on the Ars site hours agoo.
NOT CHEAP!!!!! what kind of chips are you buying?? or are you eating paint chips?
I was hoping for a chip that would ease the transition from x86 based software into generic Unix software. Imagine a chip that would allow a x86 compatibility-mode but would run native software more efficiently. That would ease the transition in the same way that Windows 3.0 eased the transition from DOS to Windows applications by providing backwards compatibility for old apps. It appears that Transmeta is not interested in documenting the native instruction set for Crusoe which means that there won't be native compilers which means there won't be native apps that take advantage of the speed that the new chip offers. Now *perhaps* the knowledge about the efficiencies available in the translation laver will give Linus a leg up over the coders of other operating systems, but that knowledge dosn't translate to all the other coders working on Open Source software.
Anomalous: inconsistent with or deviating from what is usual, normal, or expected
Anomalous: deviating from what is usual, normal, or expected
Canard: a false or unfounded repor
I pity the fool who don't like mr T! Mr T vs Slashdot Go Read it suckas!
Quit yo jibba jabberin' and come see Mr. T
Mr T vs. CmdrTaco would be funnier if you had Mr T saying "hella", like all the other, funnier Mr T vs. "whatever", instead of "helluva".
damn i'm picky.
Given their target of small mobile devices, like webpads and the like, its low power consumption and sleep mode, I don't think it's intended to be "turned off".
One possible problem is that the chip only has a finite number of registers (64, IIRC). So if you are emulating a chip with 40 registers, simultaneously emulating a second chip that needs more than 24 registers could cause problems. You'd probably store the extra registers in memory, which would slow the performance, but it wouldn't be any worse than a software-based emulator like vmware.
So even if Transmeta has no native C compiler, they still have a complete bootable operating system we can read.
And what C compiler does Transmeta use for Mobile Linux? Did they somehow remove the zillion lines of gcc'isms from the kernel code? Or is their compiler a derivative of gcc?
Meanwhile, I say: fuck the compatibility argument. I'm a big boy. If you tell me that my native VLIW binaries will crash and burn on the next model over, I can handle that. I'll recompile the program when I switch machines, but I want a native gcc, or the chip is not worth programming for.
Explain why Crusoe is ``cool''? My friend, for those who know, no explanation is necessary. For those who don't know, no explanation is possible.
just random thoughts...
Well, if you really read the Ars article, he makes it pretty clear that the Crusoe is NOT a high performance chip. It is designed to run "typical" applications (like Office) at a similar pace as a full blown Intel CPU but with lower power consumption. (It seems to me we have heard this line before - AMD tried to sell us all the K5 - or was it the K6 that was "the fastest engine ever designed for Windows apps." The problem was, if you wanted to run Quake instead of Office, the CPU generally sucked ass performance wise.)
I think the snake oil is pretty obvious if you look at benchmarks Transmeta has published. They are showing some "relative" time to complete typical Windows tasks vs. an Intel CPU and the Crusoe is loosing - though not by much. We don't get any "standard" benchmarks like SPEC or Drystones or MIPS or MFLOPS because if they ran those, Crusoe lack of processing power would just be all the more apparent. (Though it might come close if they ran those benchmarks as compiled native code as opposed to emulated x86.)
The reason they can get away with this, of course, is that you don't need a Pentium III 600 to run typical "Office" like apps - most of the CPU power on a chip like the PIII just gets burned up in system idle cycles anyway. Now, certainly the fact that Crusoe is low power is promising - a lot of people need a laptop that can run for 10 hours and they don't necessarily need to run Q3A full bore. It's also pretty cool that they put the "north bridge" and the memory controller on the same chip as the CPU - that's a really good idea, especially for the mobile market they are targeting. But this all doesn't excite me that much - does anybody remember the DEC StrongARM RISC? Another example of a chip that provides reasonably good performance from less than one watt of power - though it did not provide any kind of x86 compatibility.
Now obviously the Ars article points out that these aren't the ONLY CPUs Transmeta will produce. In the future they may build high performance workstation or server class chips. For now, I guess all the performance junkies can go back to drooling over the Alpha.
Just my 0.02
The reason to do the optimization on the fly is that by doing so you gain extra profiling information that is impossible to get at compile time. Dynamic optimization/recompilation allows the processor to improve the execution speed of blocks it executes frequently, and also do things like adjust caching schemes and do better speculative (or predicated, I guess) execution.
Interesting that you should ask this question. In fact, Motorola makes an embedded version of the 603e called the MPC8240. It has a built in 66MHz PCI "north bridge" and 100MHz SDRAM controller. Sounds a little like the Transmeta chip except that it consumes more power and only runs "native" PPC code. The 8240 is generally used in applications like routers or other network devices.
;)
Performance of the MPC8240 is in the range of 375 dhrystone MIPS at 266MHz. Would be nice if we had a similar benchmark for the Crusoe, yes? As another benchmark, the StrongARM SA1100 comes in at about 250 dhrystone MIPS at 220MHz - so similar performance. The StrongARM, of course, consumes less power (under 1 watt) than the MPC8240, but the 1100 does not have the built in PCI bridge.
Of course, then you can get into the "higher power" CPUs like the PowerPC G4 - it sits at 825 dhrystone MIPS at 450 MHz. Or, if you get into the SIMD vector processor, a billion floating point ops/seconds. That's pretty fast, though the chip consumes about 5 watts. Things like the Intel PIII and AMD Athlon provide about the same compute power as a G4, but consume MUCH more power - something in the range of 30 watts for these beasts. If your going to consume that much power, you might as well get yourself an Alpha which will give you double the performance of the Athlon on the same electrical budget. (You can't run x86 code native on a Alpha, but who gives a F* if you can get twice the performance for the same electrical budget?) Clearly a 30 watt CPU is well outside the notebook computer range. Obviously that's what "slow" low power chips like the Crusoe are for
The low power aspects of Crusoe *are* cool. But I am curious about what makes code-morphing "cool". VLIW is old hat, and code-morphing sounds suspiciously like a JIT: it recompiles x86 into a native VLIW format. As an example, take Sun's Java hotspot compiler, which adaptively recompiles one machine language format (Java bytecode) into another (x86 or Sparc or whatever.) Hotspot *also* does on-the-fly optimization and it also analyzes a running program for "hotspots" that need to be aggressively optimized.
I also note that Hotspot was heavily hyped and hasn't quite lived up to being the world-changing technology that it was supposed to be. I guess adaptive recompiling is harder than we thought...
Finally, VLIW *can* be damn fast. But what happens if you encounter a bunch of move instructions in a row, or a bunch of integer instructions, or whatever? Then only one of the four possible slots will be filled per clock cycle, while the other three instruction units sit around twiddling their thumbs, no?
IMO, we already have a a portable low level language. Its called C! I also suspect that any reasonable C compiler will out-optimize a JIT/Code Morpher/whatever just about any day of the week.
Hey - if I'm wrong, somebody please educate me! It sucks being ignorant!
Rumor has it that they actually ported Linux to run on bare hardware, and it didn't really help enough to make it worth the trouble. Besides, a new version of Linux would likely have to be made for each different Transmeta chip (as the TM3120 and TM5400 have different instruction sets)
One thing that we may find, however, is that a certain architecture is emulated better than x86 (i.e. the PowerPC, ARM, or Alpha architecture may be easier to translate into native VLIW) Therefore it may be a better idea to run Linx over PPC/ARM/Alpha code-morphing software on a Transmeta chip (or maybe just specific type of Transmeta chip works better, etc., etc.)
Boy, this gets confusing after a while.
On a somewhat different topic:
I kind of wonder if IBM is actually getting some technology from Transmeta. They moved the AS/400 from 32-bit to 64-bit (CPUs) a few years back and had to make sure the new systems were able to execute old code (actually, I understand that AS/400 machine code is abstracted from the object code of programs, though probably not in quite the same way as how Transmeta did things - if that makes any sense at all..)
--
Ski-U-Mah!
I understand there was a proof-of-concept demo at the Crusoe unveiling that would switch from x86 to Java bytecodes. I'm not sure if swapping between x86 and Java required a reboot or anything like that -- I wish I could have been at the unveiling so I could have seen that in person (but then I'm probably a terrible reporter ;-)
--
Ski-U-Mah!
The primary reason is that they don't want to have to make these chips backwards compatible. Intel has a lot of problems with this - even the newest Pentium III's must support programs written for 386s
Heck, a Pentium III can run 8080/8086 code (maybe even 8008 code or 4004 code!)
Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.
Heh, the thing I think is cool is that you could start off buying a chip this year, and if a new technology (Like SIMD or 3DNow!) comes out, you can just go to Transmeta's web site or whatever, download the new instructions, and go run a program that uses the new instructions! (Well, presuming that Transmeta will support older chips and whatnot -- that could be a problem with having different instruction sets for each chip. How long do you support an instruction set?)
--
Ski-U-Mah!
I believe the morphing layer is compiled to native code. If it was the highest performance way to do the morpher, then that contradicts your claim. (ok so there is some minimal component that would have to be native to boot strap the morpher) Also, I estimate that 50% of the cpu cycles are spent running the morpher, so native code would get an automatic 2x advantage over x86 code.
We've heard about a ~650MHz TM chip being comparable to a 500MHz PIII. But the real question is, what fraction of the CPU cycles are running the morpher? That is a very interesting question. Especially when comparing different morpher's etc. A first guess would could be PIII @500 is ~700 MIPS, TM gets about 2 x86 Ins/cyc so 350 MHz are spent on application code and ~300 MHz on code morpher. I'm damm impressed with a near or better than 1-1 ratio!
And to beat a dead horse, the code morpher also optimizes. This is extremely important to the performance of Crusoe. It can actually run programs faster than if they were compiled natively, due to the run-time information available to the optimizer.
--
What Transmaeta has essentially done is take the Merced core and execute the compiler at run-time. The alias handling structure acts like the ALAT on Merced.
Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing.
Think of it this way: would you rather manage your stock portfolio as is done today, by guessing what might happen, or would you rather know what the market is going to do and trade your stocks accordingly. I guarantee that I can beat your statically predictive management every time if I have that additional context.
--
The translation software provides backward compatibility, yes, but it also provides flexibility for Transmeta.
What if Transmeta desigs the TM-ISA? It's a virtual machine designed to translate efficiently to the bare hardware. Now compilers can take advantage of the additional registers provided by TM-ISA. If a new core provides more physical registers, TM-ISA v.2 can be released, allowing the use of more registers by the compiler.
That's all well and good, but we get the additional benefit that old programs run on the new hardware just fine, and there's no additional hardware cruft to maintain compatibility.
Ok, that's pretty cool. Backward compatibility is important. But what's really neat is that Crusoe provides forward compatibility. Code written to TM-ISA v.2 will run just fine on processors released with TM-ISA v.1 as long as new firmware is loaded that can understand TM-ISA v.2. So now software houses can release code optimized for the latest and greatest without worrying about users behind the curve not being able to run their stuff.
How often do people moan about RedHat not providing Pentium-optimized packages? With Crusoe, RedHat can silence the critics without impacting us 486 users.
--
Note that there is no reason Crusoe couldn't support a staging compiler. Transmeta could always release a virtual ISA that had support for doing this efficiently. And of course you could always write a dynamic compiler in x86 (ugh). The point is that Transmeta could directly provide support for something akin to DyC in a later processor. And still maintain both backward and forward compatibility.
Pretty neat trick, I'd say.
--
--
I rather liked the idea that one poster suggested: rather than writing to the native instruction set, invent a new intermediate instruction set that is optimized towards making a better-performing code-morphing layer. It's a very interesting suggestion.
I also wanted to say that I'm surprised that more folks aren't really excited to read the insightful analysis at the end of the article where they gave a convincing argument for future transmeta chips that are not limited to the low-power mobile market. It had me salivating.
The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...
The article contains some good reasons for not doing it ahead of time in the compiler: with the code-morphing layer, you can keep real statistics on which blocks of code are actually used frequently, and whether or not a branch is likely to be taken -- under the actual conditions that the software is running. I know of no compiler that optimizes by running code with real data. Can it really be done? It just sounds like something best done dynamically to me.
The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...
The point behind the Crusoe is not, not not NOT, to just be a better faster chip that optimizes better and consumes less power than those on the market now (though it is.)
The Crusoe's selling point is compatibility. Transmeta can churn out all sorts of chips, some optimized to sip current from batteries at a tenth of the rate of today's monsters, some designed to guzzle power even more and be speed demons. They can make radical changes to the basic design of the chip while doing this, and it won't matter, because though the way things are done internally may go topsy-turvy, the instruction set won't change, and the same programs can be run on each.
This neatly solves the drag placed on development by the need for backwards-compatibility (Want to run DOS 3.3 on your Athlon? You can if you feel like it.) Just like Windows, x86 chips have accumulated baggage - the sediment of silicon long since passed into figurative dust.
Transmeta has designed a beautiful thing - a chip that transcends backwards-compatibility. Writing to the bare metal on the Crusoe bolts it down, turns it into just another fixed-in-place bit-smashing engine. Kills it, in other words, removes what makes it an elegant hack.
Don't do it. Please.
Hrm, perhaps because there are folks such as I, who not only do not regularly read Ars Technica, but also aren't whiny bastards such as you?
Stating on Slashdot that I like cheese since 1997.
OK, Transmeta have proven that they are pretty damn good at keeping secrets, so I would take the info obtained from that Usenet thread with a decent sized grain of salt (as opposed to most other Usenet "wisdom"
Ahh - My eye!
The doctor said I'm not supposed to get Slashdot in it!
After hearing media reports that varied from referring to Linus as a "key executive" within Transmeta (he's not a corporate executive, which should be obvious to anyone who's viewed the web site or bothered to read the press package distributed at the launch), to describing Crusoe as "Internet-powered" and then asserting it draws its electrical power from the Internet itself, it's nice to see that someone with a clue actually sat down, read, analyzed, and reported on the technology that we introduced yesterday.
First thing is that you won't have many useful instructions to do what you need with the simple native instruction set that Crusoe provides. So you would need to be creative and optimize your code very well to get the speed you are looking for. Remember, your code optimizing abilities are competiting with a very advanced code morphing technology. Next, for the few clock cycles that you are going to get out of doing native is not worth the programming effort.
Here is what needs to be done instead. Design an instruction set specific to the application that you are writing. Our current CPUs can handle very broad tasks and try to be good at everything and when it can't things like MMX, 3DNow and whatnot start to show up in the CPU.
So, If you know the box you are setting up is going to be a web server, design an instruction set that a web server would fly on. If you play games, design an instruction may looks like 3DNow on steroids.
ayottesoftware.com
I suspect that the translation units are based on so called 'basic blocks' which can most easily be described as anything in between a target label and a branch (i.e. entry and exit points in your code). This would allow optimisation of loop bodies.
This can be extended by going to 'super blocks' (multiple basic blocks) allowing sofisticated things like loop unrolling, software pipelining etc.
What I'm actually interested in is how the translation cache is being accessed. In a later post somebody states that the translation cache is maintained in main memory (therefore benefitting from the regular data cache). I'm not sure I understand how it is possible to do efficient cache lookups in this way. I assume they use hashing methods to map x86 memory pages to 'translation cache lines', but this has a much higher overhead then hardware based cache lookups.
I am also been a bit suprised by people being worried about loosing the cached translations when powering of a system. People, we're talking here about loops that are being executed 100s if not 1000s of times. Having to do the translation again for the first few iterations is not going to be the big performance loss they seem to think it is!
Maarten Boekhold
'whatshisface' would be David Patterson, who together with David Ditzel authored the 'The Case for Reduced Instruction Set Computing' article which started the whole RISC thingie.
Maarten Boekhold
I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if there isn't room onchip for multiple instruction sets.
I would guess not, since there is only a single TLB, configured at boot time. Unless you wanted to flush it every time you changed instruction sets (!)
Just junk food for thought...
Still, this could be a big pain in the ass for people who aren't comfortable rooting around inside their computers.
IMHO, people who aren't comfortable "rooting around inside their computers" probably won't be writing their own code morphers. This isn't script kiddie stuff...
Just junk food for thought...
It would seem to me to take some work on top of what was released to be able to attack the server CPU market.
...
I"d love to see these happen in next five years:
- Code Morpher for Alpha, PPC,
- Code Morpher to recognize the instruction set of a binary
- "optimization practically finalized for this piece of code" bit
- a TM CPU bus for several chips to share the same translation cache
(how necessary is this actually?)
- communication interface for operating systems
- ability to save final VLIW version of code beside the original binaries
Those would in essence offer the ability to turn a system eventually to VLIW binaries without actually putting any effort to it.
Once TM has covered its development investment:
- Open Source the Code Morpher
-> worldwide development of support for
- any chips
- integration with high-level compilers.
"No stop signs! No speed limits!" - AC/DC: Hghway to Hell
I think, therefore thoughts exist. Ego is just an impression.
Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.
Yes, but a good compiler will generate fully optimal code to begin with. A compiler that targets the Transmeta core Instruction Set should give you better code than the two level translation scheme.
But that's neither here nor there. Transmeta will not want people to code to the native Instruction Set because it will undermine their flexibility with the underlying hardware. Right now, the major benefit of the two level translation scheme is that the hardware architecture can be updated and improved while presenting the same programing model to application developers. This will allow Transmeta to aggresively experiment with the hardware architecture while maintaining software compatibility. This is very very cool!
If you read the FAQ, it explicitly states that the source will be released.
Only time will tell whether Transmeta's making us pay a penalty up front in the form of morphing so that they don't have to deal with backwards compatiblity in future will pan out for them from a business point of view. If all this thing does is run x86 code at lower power, they aren't going to have a market lead long. Two things are happening right now, guaranteed: Somebody is reverse engineering it The big boys are doing the same damn thing as fast as they can One of these two items will cut Transmeta's legs out from under them. Unless they get a killer app for the CPU and penetrate the market as quickly as possible, I'm not sure there's enough here to justify the effort they've gone to (read the VC dollars pumped in) I mean, if you'd just sunk $100 million into a company over 5 years and they came out with a slower x86 clone, what would you think? Oh, and I guess I have a question too? Seems like sometime in the past I was under some foolish impression that software was a lot more expensive to develop than hardware. I'm just wondering how this fits into this idea of pushing function that used to be in hardware up into software?
Frank W. Miller
Evidently you haven't been reading much of the Crusoe propaganda. They don't want anyone to access the native instruction set so that they can change the chip core without having to worry about legacy apps. Imagine a chip that could go from pure CISC to RISC without having to change the apps. In this way the hardware implementation is decoupled from the instruction set interface.
Pretty neat, but I haven't seen any real mention of emulating any architectures other than x86.
Scuttlemonkey is a troll
The Ars article also points out that some of the registers are used by the code-morphing software, too... you couldn't count on having 40 + 24.
--
Brent J. Nordquist N0BJN
Certainly not the type of chip I want to be playing quake on... For now.
The telling quote is at the end of the article though:
I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait.
It does make me wonder though, if such a chip (Slightly altered) would actually end up being superior for Quake. Given that the translating software is able to identify which parts of cache are used more often it becomes better at branch prediction, this could translate into faster gaming... I think... Contrary to this thought though is the fact that the Celeron is a good gaming processor with 128K cache... We shall see.
Along similar lines, if the x86 instructions are software, how much of the x86 instruction set does Quake use? Would the flexible software end up speeding up Quake by getting the x86 instructions out of the way?
Try to hack my 31337 firewall!
It will probably be out, but Linus does not have to release it. Still, you know he will.
Try to hack my 31337 firewall!
You lot just aren't getting it. If you remove the code morphing layer, then you have to put backwards compatibility into the hardware down the road.
Not really. Transmeta could then just write a code-morphing layer to "morph" the ISA you coded to into the new one. No?
I don't care if it's 90,000 hectares. That lake was not my doing.
No, no, please! That would be a disaster!! The hole point of this architecture is to get rid of this compability mess.
Well, someone will still have to suffer the incompatibility mess: Those who write morphing sw for various cpu's. This will surely be more than just Transmeta, if the concept takes off.
The x86 instruction set isn't necessarily the best for this chip. Someone could make up a different one (perhaps something that use 32 registers or so) make a compiler for it, and have better performance than x86 code on the same chip.
This would have to be rewritten for another chip, but rewriting the instruction emulator is a lot less effort than recompiling the os and all apps. Still, someone must do it.
Transmeta could then just write a code-morphing layer to "morph" the ISA you coded to into the new one. No?
Brilliant! The Meta Morphing Power Processors! Why stop there. Why not have transmeta write code morphing software that emulates their native instruction set and on top of that run code morphing software that emulates their native instruction set and on top of that run code morphing software that emulates their native instruction set and on top of that run...
IT'S TURTLES ALL THE WAY DOWN!!!
--Shoeboy
way Way WAY back in micro-processor terms (1984-1985), I developed a white paper that attempted to extrapolate where PC's would develop by Y2K. (I'll put it up on my website if I can find which 5-1/4" floppy I saved it on, and re-hook a 5-1/4" drive to my PC).
Hopefully it doesn't seem self-congratulatory (because a number of my other conclusions stunk) or redundant to this thread to mention that three or four of the paper's conclusions fit the idea of developing a Crusoe type "beowulf in a box" exactly:
- High speed, low power CPU cores would be required ( 200 Mhz speed). Why? Because even if I had the ability to write programs that could keep all the Crusoe processors running at full tilt 100% of the time, I could conceivably power 50 Crusoe Processors or so on the same power supply that used to supply two Athlons (68 Watts),
- The CPU units would perform on-chip instruction decoding so that chip and system architectures could be developed more flexibly,
- Each CPU would have an abundant amount of cache memory in which to put commonly executed code units, and finally that no matter what,
- for performance, massively parallel execution was more important than raw speed in terms of overall CPU speeds, etc.
Now then, programming for massively parallel system is a b----, and I couldn't do a Beowulf cluster if I tried, but these chips and the StrongArm series are the first ones which met all of the specs in a fifteen year old paper.Just in time for Y2K. Interesting, eh?
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
Bravo /. that is the kind of stuff I wanted to read about the chip.
While a lot of people are concentrating on how well this will work in small devices the author of this article is excited about the large-scale applications of the chip. I would have to agree. Think about a busy web server that is continuously generating web pages and doing database transactions. The code morphing software can spot that trend and be ready for it.
Should be interesting.
MBrod
If yes, where is the source?
The only reason all cover-ups appear to fail is that you never hear about the ones that succeed.
The time it takes to re learn the optimization is very very short when compared to power on/off cycles.
-- these are only opinions and they might not be mine.
I'm not a hardware guru so pardon the speculation...
Obviously, the code morphing is focused on x86 right now and, as the article suggests, may be adapted for PPC, Alpha, etc. in the future. Is it feasible that it could also be adapted for specialized processors such as graphics or sound?
I'm imagining an SMP-type of Transmeta box that, when you load Quake, automagically loads code morphing software onto one of the processors to act as the graphics accelerator or, if you're watching a DVD, can act as an MPEG decoder card
Is what I'm suggesting conceivable or am I way off base?
So, we know by now that Crusoe only requires around 1 watt of power to operate, and that this results in a maximum temperature of 48C (thus, a fan isn't required to cool the thing). But, if you don't really care about power consumption and you installed a fan over the heatsink, one has to wonder how much faster these things can be clocked before they start showing glitches. The only problem I can see is the LongRun software which will automatically reduce power consumption if it's not necessarily needed -- this might mean that the only way to overclock the chip would be to modify the LongRun code (stored in FlashROM).
Any guesses as to how long until someone figures out how to patch the FlashROM so to allow overclocking? I give it about 6 months after Crusoe based systems hit the shelves.
-NooM
Even if you think Assembler is a high level language, you probably do not want to code directly to the bare metal. It is not a nice native VLIW machine code. It is the target for the code morphing layer. It's been a long time since I've even looked at microcode (early low-end IBM 370s were microcoded) but it tends to be obscure, twisted, very unfriendly, and I cannot imagine that it's gotten any better with time. Minor mistakes do very bad things. Only one program is written, the program to read and execute the "higher-level" machine code.
To nit-pick,
SUB CX,AX
sets flags based on result in CX
If things are case sensitive, Cx would be a valid label.
Actually, both ADD and SUB set flags on x86.
Yes, but a good compiler will generate fully optimal code to begin with.
How do you prove a whole program (as opposed to a short algorithm like quick-sort) is optimal? Isn't this a very hard problem to solve?
I strongly believe that trying to be clever is detrimental to your health. -- Linus Torvalds
What if the code morphing software authors (Hi Linus!)
decided to 'extend' the X86 base instruction set just a bit, like 3DNow, MMX and so forth, only not to
provide graphics acceleration, but to provide 'hints' to the code morphing software?
In my imagination, there need be just a few of those extra instructions, and a clever compiler can stick them in to provide
the code morpher with some difficult-to-find but static dependancy/ordering information. (I.e. potentially anything not needing run-time statistic gathering).
If these hints are absent, the morpher just does its regular job. Now, good idea or not?
All generalizations are false, including this one. (Mark Twain)
I supprised this chip wasn't sold as "the first nature friendly chip."
I've heard all the statistics about 20,000,000,000 tons of coal being burned an hour to support the internet's routers, and a hundred times that to run our desktops.
They should be getting any partner they can, and if some tree-hugger organization sells your chips for you; it ain't bad.
One point the chap from Ars Technica misses is the heat output from laptops.
I know that I for one am looking forward to a cooler laptop.
Andrew.
There's a basic risk here, though: from what I understand, the 'Code Morphing' software doesn't reside in main system memory - instead, it's in a special on-chip memory area, which is loaded from a ROM at boot time. So you replace the ROM with an EEPROM, and make it possible for users to cram a new instruction set in there. What happens if there's a bug in that new instruction set, or the flash process fouls up? Your computer won't boot. It won't even come close to booting - this isn't something you can fix with a bootable floppy, because the code to load the system on the boot floppy won't run any more. Now how do you fix it?
How do you fix it? Well if this code-morphing software is on a flash ROM so that it can be rewritten, it would be in essence like having a second flash BIOS. There would be no greater threat here than simply flashing you BIOS. It's a pretty safe process if you take a little care.
Wrong. SMP relies on shared memory, so doesn't scale as well as a truly parallel (ack, I can never spell that corecktly) architecture.
For SMP, the performance increase as you add extra chips decreases, and tails off dramatically at a relatively small number of chips (12 IIRC) due to the communication bottle-neck (this is for an OS that handles SMP well, Linux doesn't scale at all well ... yet). Parallel machines, on the other hand, give a theoretically linear increase in performance as you add more nodes. This is why 'proper' super-computers use parallel transputers, rather than just building big SMP machines.
--
One thing that we may find, however, is that a certain architecture is emulated better than x86
Or even create a new 'abstract' instruction set that is architecture-independant, hence clean and probably faster to emulate than trying to emulate code optimised for different hardware (hell, we might even see the kernel re-written in Java, so it can run it as byte-code {j/k}). On a side note, do you think code that's been aggressively optimised for a certain architecture will run faster or slower on transmeta than code that's just been 'normally' optimised?
--
Ever seen an Avalon box? Lots of Alphas on
:>
small boards w/ lots of memory plugged into
a fast backplane. You don't need a built-in
hard drive, you just need a workstation to
"feed" the cluster - you have a nice fast
workstation hanging off a nice fast network
connection (Myrinet seems to be popular for
this - it's what Compaq was using in the
Beowulf cluster they had at the Atlanta
Linux Showcase). You just need some way to
load a "bare" OS so the processors can start
talking to the network - and (at least in
theory) the linux distro. that's "built into"
some of these chips could do that.
'Course, as others have pointed out, the
chips unveiled yesterday aren't high-end,
but that doesn't mean they won't have
a high-end design in the future.
ObTagLine: The more you run over the 'possum, the flatter it gets.
Inter-node communication is far more important than heat or space. In fact, I would say that heat and space are two of the smallest problems with Beowulfs.
Its basically a good compilation of the technical specs...what i really want to see is the power consumption comparisons between the TM5400 and the PIII...that was the coolest part and they don't even have a diagram anywhere...but its a great in depth look. Can't wait to get my hands on one of the devel specs kits.
We are the music makers, we are the dreamers of dreams
JediLuke
JediLuke
-Do or Do Not, There is no Try
I couldn't care less about laptops or handhelds. Low power, high performance chips are always good, though, especially if they're cheap and have a gimmick (like, say, emulating other architectures).
So, when can I buy an SMP motherboard with six or eight of these Transmeta processors on it? Intel is the only game in town for low-cost SMP, and it's not very low-cost at all, IMHO. Have you priced the Alpha lately? There are Alpha CPUs being discontinued that cost more than my whole SMP P3 U2W SCSI workstation! Funk dat!
I bet I could build a quad Xeon system with Ultra160 RAID and 21" monitor for the price of a barebones 21264. Factor in a second CPU and SMP Alpha board, and I could have a beowulf cluster of quad Xeons plus 21" monitors, RAID, and gigabit ethernet. Bah.
Why limit yourself to choosing one at boot? Why not have a multi-chip module with four Crusoes in it and emulate several architectures simultaneously?
There's a very interesting paper on the Crusoe and the Very Long Instruction Set processor it uses hereIt's in .pdf. The coolest bit is it's IR pictures of both a PIII and a Crusoe playing a DVD with software. The PIII is operating at a max of 105.5 Celsius, the Crusoe at 48.2.
Insanity is contagious. - Yossarian
This article was very informative overall. I found particularly interesting the bit about Transmeta asking for a new set of benchmarks that takes into account efficiency as well as performance.
If Transmeta could afford to hire Linus Torvald's to create a Mobile Linux for their CPU, why couldn't they hire someone to create benchmarks for their CPU? That leads me to wonder, are there any good open source benchmarking programs? Perhaps if an open source one became popular, it would be easily modified to do benchmarking for new processors like this.
Find and share links to celebrity profiles on MySpace! http://www.myspacecelebrities.com
It seems that what Transmeta has done is to take the ideas developed for JIT compilers and apply them all to hardware. Pretty neat stuff.
yeah well, I thought the whole point of this was about more efficiency and power consumption; not about raw "pedal to the metal" speed.
>>>>>> Chewie, take the professor in the back and plug him into the hyperdrive.
I'd heard some info about what Transmeta's chips were doing ahead of time (under non-disclosure, of course), but never had enough information to figure out what they were doing that was all that different from microcode, which of course has been around for years and years.
Props to Ars for such a clear write-up.
And props to Transmeta for rethinking the problem.
-----
Klactovedestene!
I you crazy? have you seen CPU prices lately?
"Nobody owns the fucking words man." - James Dean
They have essentially built a Japanese Compact Car that is fuel efficient, and not an Italian sports car.
It's like the Mazda RX-7 of microprocessors!
It does it a little different and a little better.
There is alot of glitzy information now available about Crusoe VLIW, a core instruction set that is nothing like x86 and the code morphing software. But the actually technical nitty gritty seems to be lacking. Can a program get access to the core instruction set thus bypassing the code morphing?
No, no, please! That would be a disaster!! The hole point of this architecture is to get rid of this compability mess. They've already done two different instruction sets. Every new processor from Transmeta will likely use a new instruction set, optimized for whatever the processor is designed for. Don't you see the advantage of this?? Well Transmeta does, so they won't be releasing their compiler, or any specs.
Elbrus E2K uses a similar translation technique, and is said to have seven times the performance of an alpha. So it's not just about power, Transmeta will probably make faster chips in the future. It's probably not a coincidence that Dave Ditzel used to work with Elbrus in Russia back when he was working for Sun. Personally, i think Dave Ditzel is a bit embarassed that the Crusoes isn't faster. This guy used to do chips for UNIX workstations that nobody could afford, back when we where using c64 and spectrum. Look out for fast chips from Transmeta in the future. And as the article points out, there are some hints in that direction.
"Bernoulli was wrong. X proves that you can fill a vacuum, yet still it sucks." - Dennis Ritchie
One thing that we may find, however, is that a certain architecture is emulated better than x86 (i.e. the PowerPC, ARM, or Alpha architecture may be easier to translate into native VLIW) Therefore it may be a better idea to run Linx over PPC/ARM/Alpha code-morphing software on a Transmeta chip (or maybe just specific type of Transmeta chip works better, etc., etc.)
I have thought about this also. I don't want this old legacy stuff, although the core is all cutting-edge and all that.
You could design a "pseudo-architecture", that is optimized to be translated in realtime by code morphing software, rather than executed in hardware, like current architectures. That is what Elbrus did with their E2K.
"Bernoulli was wrong. X proves that you can fill a vacuum, yet still it sucks." - Dennis Ritchie
The answer is both. It is stored in main memory, which is cached on chip. The CMS sets the memory usage on boot, but the OS can change the allocation on the fly. It's in the Ars article.
I missed the webcast, so reading this article was something of a revelation to me. I'm amazed by all the things they do differently than any processor it is compared to, and although I know nothing about StrongARM or other mobile processors I have no trouble believing that it's completely different from them too. Just how long did the developing of the Crusoe take?
The Slashdot crowd (and me too) usually feels reluctant towards doing things in software that could be done in hardware, ie. WinModems. However the Crusoe does this for a good reason, to save power, not production cost.
I'm already waiting the Transmeta ads showing not MHz, but MHz/W numbers..
Technically, the immediate candidates which came to mind to pull such a trick have been X86 companies. They have a lot of experience in converting one instruction set (X86) to another (whatever their chip "really" runs). For some strange reason, Intel never made anything of this
Transmeta, on the other hand, seem to have both the technology and the lack of, shall we say, unfortunate entanglement with a certain software company
Given that that the same chip could also run another instruction set at the same time (its all in the code morphing software, after all), then you'll get a machine which runs native Windows/Be/Linux/Etc. applications, but doesn't discriminate against Java ones. In fact, it might even encourage them. The Java bytes code should be much easier to optimize then X86...
If Sun has any sense, they would be starting to work on such a thing ASAP. It is a natural fit to their Jini initiative - not to mention the HAVI one. If Transmeta would release a Java-friendly chip with tailored HAVI support, they could be "the" choice for consumer electronic devices.
Life is definitely going to be interesting in the next few years...
Crusoe IS NOT FOR GAMING MACHINES.
Crusoe IS NOT FOR SERVERS.
Crusoe WILL NOT REPLACE THE PIII and Xeon and Athalon et al
Crusoe is for machines where high end performance is secondary to EFFICIENT PERFORMANCE. (laptops etc..)
Will you see an SMP capable Crusoe? Probably not. Why? Because you don't need SMP to run Word and Netscape well.
Will you see desktop computers running Crusoe processors? Probably very few, and most of those will be built by people like the slashdot crowd who believe its the best processor for all jobs,and those who just like to tinker.
Crusoe isn't designed to be the best in all areas. It is designed for the mobile market. If you want to put together a server or a blazing fast Quake machine.. DON"T GET A CRUSOE. Thats not what it is designed for.
Thanks Ars Technica for not falling for the hype and telling it like it is. And thank you for a wonderful technical briefing. As always, the technical writing prowess of the Ars Technica staff impresses me. Most people have a VERY difficult time explaining technical issues half as well.
"Anyone who can't laugh at himself is not taking life seriously enough." - Larry Wall
Doesn't stand to reason that Linux compiled for Crusoe would run faster that Linux compiled for x86? I would like to see tools available to compile the base OS in native instructions and run the apps in x86. This should give a significant performance boost. Maybe a tool to convert binaries and save the result to disk? The users could DL apps like WordPerfect and convert them to Crusoe's binary format. If one is running only open source software, then one could recompile the world. Now that's what I'd like, a Crusoe all-day laptop running Crusoe Linux. ;-)
Tim Riker - http://rikers.org/
That is an excellent article. If you are into CPUs you should read the other articles he has done on Ars Technica (linked to at the bottom of the Transmeta article).
${YEAR+1} is going to be the year of Linux on the desktop!
If you can sign up to receive Transmeta's developers kit thingee they have on their website if you're not in a company and will probably never really produce anything? I'd just love to read about it and toy with ideas, but I doubt I'd ever produce anything of real value... I don't want to sign up and have them smack me on the ass...
Esperandi
The marketing guy explained that Transmeta has no hype, its all buzz ;) Buzz is when other people speculate about your company and products, hype is when you speculate about your own company and products... I'm not sure which is better, or which leads to less let-downs...
Esperandi
> I want to see Crusoe vs StrongARM. ... )
And I want to see Crusoe vs ARM's ARM10 (400 Dhrystone 2.1 MIPS at 300 MHz... optional Vector Floating-Point unit capable of delivering 600 MFLOPS
The StrongARM design is over four years old now and developement of the StrongARM family has nearly stopped after Intel bought it from Digital Fastest StrongARM when launched (5th Feb. 1996): 200Mhz SA110, fastet StrongARM Digital made: 233Mhz SA110, fastest StrongARM Intel makes now: 233MHz SA110...
I think SMP is a good idea for these chips. If the architecture can be modified to do it, I don't see why you couldn't have four or eight of these on one board. If you've ever looked inside the chassis of modern dial-up gear (the "modems" on the ISP's end, not the POTS device with the red blinky lights you have on your serial port), you know it's not unreasonable to have upwards of 8 processors - such as the i960, in Nortel's CVX gear - on one card alone, with numerous cards in one chassis.
At that point, you could build a massively parallel single computer, or a cluster of them if you needed even better/more redundant/more fault tolerant/whatever performance.
Everything in the docs for quake3 implys that you *need* some form of hardware acceleration to even run the game at all. Mesa lets you get around that, but at what a cost!
Anyway, if you check Transmeta's circuit diagrams, a large part of the chip is blocked off a "Floating Point / Graphics Unit." True, the chip has a rockin' 128-bit core, but I doubt that it could handle the general loating-point calculations needed for transform and lighting as well as the rest of the 3d pipeline normally handled in the accelerator card (read fillrate).
Although if they're powering digital LCDs they don't need massive, hot, expensive 350 MHz RAMDACs...
Tetris rules.
IBM in the '70s had a problem...they could build more powerful processors, but had to keep compatability with previous ones.
Part of their solution was to rig the OS so it could "host" other more primitive OSes and make the appropriate calls to the new CPU as needed.
Looks like Transmeta just re-applied this, only at a much more silicon layer.
I guess those who don't study history (are you listening, Intel?) are doomed to be defeated by those who do...
Meow
Yes, that's really my e-mail. Don't change a thing.
Why? The argument usually goes like "Hey, I have some component here that in theory I could be using but they won't write drivers for me or release the specs so I can do it myself." The counter argument is to go buy a "real" modem with everything implemented in silicon. Pretty soon people will be complaining, "Hey, I bought this laptop and it won't let me run LinuxPPC, even though it is clearly capable of doing so, if they wrote a code morpher or released the VLIW specs so I could do it myself." A similar counterargument is: go buy a real processor that supports all this in silicon.
...to, instead of writing software for the code-morphing and then run it on the normal chip, instead make two smaller chips, one being the Crusoe and the other being another chip that would be optimized for the functions code-morphing needs? He mentions that most x86 chips have all these functions on the chip, and the chip is optimized for those functions, which translates into a bigger/hotter chip but also a faster chip. In Crusoe, the stuff is done in software, which actually means that Crusoe still has to do it, except in a different way and without optimized hardware. Now, the reason for this is to keep heat/size/energy use down.
So, why not make two chips (you could put a heatsink on both if you wanted them to be well cooled), and first send the instructions through one (the 'Friday,' as it were) which would translate the x86 instructions, do the branch predict, register rename, and instruction reorder. Then you send it along to your main chip (the 'Crusoe') which would then do the processes. In other words, instead of just making the chip smaller and then running what you took off the hardware in the software, move it on to another chip. This would:
A) Increase the amount of airflow you could get over your chips (because there are two chips instead of one). And...
B) Increase the amount of work the Crusoe could do, because it isn't doing all that translation anymore. This would have the added bonus of making the Crusoe even cooler.
Now, I suppose the tradeoff would be that, since you're running on two chips instead of one, the information has to go farther (it has to travel between the chips instead of within the chip). Except that if you put your 'Friday' in the way between your Crusoe and it's input, you'd be replacing that much wire. I don't know how much of a speed loss it would be, but I don't think it would be much. And since you'd never have to send information from the Crusoe to the Friday, there wouldn't be any problem there; when the Crusoe does its thing, it can just send the output directly to whatever needs it, bypassing the Friday.
Also, it might not fit in a standard motherboard, except that since the Crusoe doesn't seem like it will be sold alone (it looks like it will be built into portable devices), it shouldn't be a problem.
Then again, I could be wrong.
I watched the entire Transmeta presentation yesterday (~2 hrs long). From what I saw, I got the impression that the "Code Morphing Software" also serves as a layer of abstraction, allowing Transmeta to change the underlying CPU implementation or instruction set without breaking applications. I even saw (I think in another /. post) that even the VLIW instructions are at least partially translated by the "Code-Morphing" software into a lower-level format.
Playing around with the low-level stuff - including branching, etc - would be a blast, but I got the impression that Transmeta would remain reluctant to release specs, for fear of being forced into the backward-compatibility game, much like Intel.
This is one of the questions I really would have liked to hear asked at the press conference - "Are there any plans/hooks in place for SMP operation?".
Massive SMP looked very probable IMHO - especially the heat/power consumption angle of it.
Is there opportunity here to somehow make Java faster? It seems redundant that the JVM converts to native code which gets converted to Crusoe's ISA. What if a JVM could directly convert to Crusoe's ISA. Please excuse my ignorance on technical matters; and please no flames about Java.
ActiveX? Ewwwwww! :)<humour>
:) )
OK, RedHat may not provide pentium-optimised packages, but if the _chip writers_ write the compiler optimisers (which they could put the team who are currently working on the Code Morphing Software on, as that wouldn't be neccessary anymore) then they could ship the relevant compiler for the chipset, along with an OS compiled with that compiler on the box they ship.
This would also mean that even though the compiler is optimised to some degree by the chip vendor, the Open Source community would be able to play with it as much as possible to eke out maybe a little more speed, which they can't do at the moment with the Code Morphing stuff (as as I have been led to believe)
(Note : this is still devils advocate. My personal opinion is still that the Crusoe range sounds like seriously cool stuff - so no flames please
Why doesn't the gene pool have a life guard?
Playing devils advocate here (sort of)
The argument you're making surely only holds water for closed source software.
The only reason that Intel *has* to maintain backwards compatibility with the 386 (and even 286?) is because there's a load of really old *binaries* out there that won't run if you remove some of the old instructions. Surely with OSS, all you need to do is write a new back-end for your favourite cross-platform compiler suite (gcc, anyone?), rebuild your app and copy it to your new computer with it's brand spanking new chip that doesn't have anything in common with anything that's come before it, and it'll all still run fine.
You want to junk those extraneous FPU instrunctions that now have equivalents in your new SIMD unit? Go ahead. The new compiler back-end you've just written to accompany your new chip won't generate any of those old FPU instructions, it'll pass them to your SIMD unit.
Backwards *binary* compatibility is a *closed source* problem.
Why not have your compiler generate native Crusoe 3400/5400 instructions (if such things exist).
K.
Why doesn't the gene pool have a life guard?
The morphing software is going to be stored in flash ram and loaded into system memory (or maybe the L1 cache?!?). Sounds like a pretty crazy scheme they have come up with.
I think that a processor like this aims not to be flaming fast but to be flexible. I doubt they made this thing so they could get two hundred fps out of quake in software mode. I personally think that flexibility or extendability is more important than speed because if you want to change something, you don't mess everything else up. Would it be neat if everything was backwards and forewards compatible?
Here's a non-hardware example: Oracle. Originally, ORCL used basic heuristics and rule-based optimization. However, for large DB's and high-throughput installations, the big win comes with the Explain Plan and the performance-based optimizer. In newer versions, they will stop supporting the rule-based optimizer entirely. (read Oracle Performance Tuning)
Simply stated, there are things you can do @runtime that are nondeterministic at compile time, and thus more efficient.
Make sure everyone's vote counts: Verified Voting
That's the whole point of Crusoe, you DON'T code for it directly. It takes other instuctions, starting with x86, and runs them faster, better, and optimizes on the fly.
The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed. Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.
What about designing a virtual architecture that is very easy for the code morphing engine to translate, gets rid of performance degrading quirks (like the x86 exception handling mechanism), and also allows you to give hints to the optimizer and code morpher. Programs compiled to this virtual architecture would execute faster than x86 code, but would still take advantage of all of Crusoe's features and also be runnable on all of the different Crusoe chips.Give a man a fire, and he'll be warm for a day, but set him on fire, and he'll be warm for the rest of his life.
That's the old way of thinking. Crusoe sleeps drawing less than 20 milliwatts. Who turns this thing off?
It runs Linux. Who needs to turn this thing off?
It runs Windows... well, that's a different matter.
Transmeta: the processor rethought.
-- @rjamestaylor on Ello
Why not? its a CPU. It can run Linux. There's no reason I can think of why it shouldn't be able to be the cpu in a cluster node. Only problem I see is the price (min. $65 for the cheapest and $120+ for the bigger one) which is not exactly low.
Help! my
This article brings up some interesting possibilities, and I'm wondering how viable some of the logical extensions to these possibilities are.
.sig. It isn't very big.
He mentions, for instance, that PPC/Alpha emulation is theoretically possible. Would it be possible to do both at the same time, some soft of hybrid Mac/PC?
Also, if the translation is done in software, could it not be possible for the bios/OS to recognize one x86 chip, but have the code morphing software actually be translating for X processors in parallel? There would be no recompiles necessary, and it should run as a super fast x86 box, shouldn't it?
Thanks for all the great info!
This is my
This is my
--An Oldie, but a Goodie!
You have a genuine Intel x86 chip running x86 software with hard instruction set. Then, you have "cool shoes" running very long instructed road with soft ware, morphing to be x86 compartible. How does that make it any faster than an intel chip ? Even Einstein cannot break the speed of light and Linus cannot break the software gates. Before I go any further towards off topic, all I want to say is, don't trust Transmeta! They make claims but show no real benchmark or real solid evidents. I felt pity for Linus after I watched the webcast.. what was he thinking, being manipulated by some corporates "bad guys" to play quake like a kid. Remember what the guy said? It was his show!! I bet they fixed the match, so that Linus will loose and makes Linux looks bad. Now, take a look at transmeta's website and see who are the bosses? Linus is nobody, an employee and a tool used by a startup company to attact Microsoft's attention. Oh yeah, Linux will run on the 400mhz chip and that's all it can do, forever doom in a rom chip. If you want real mobile solution, try the 700mhz solution that runs on Microsoft Windows!!! (Too bad this is just a joke and do not represent my view of the actual event.)
we've seen with the hotspot java vm 100% increase in speed and significant decrease in footprint - this is all due to the profiling info. it collects while running code. Same goes with crusoe morphing layer. Now what would be interesting is to see HotSpot on Crusoe vs. MAJC!
I'm Canadian, so my money isn't worth much, and I'm just out of college and just started working, and I could still afford to buy about 4 of the high end ones per month. (minus motherboard & extra hardware requirements). My computer would go up by approx. 2.4 Ghz per month!
Considering the low size/heat/power usage of these chips, they could probably squeeze several of these into one chip. Imagine a 3Ghz proccessor that can run x86/macintosh/alpha/etc.. with just a change of software.
I'm afraid that I have to agree with this posting. However, I can see alot of uses for the Crusoe, I have a hundred users who never do anything but but run word processing who absolutely don't need a PIII or anything much above a Pentium 233. In fact, I have trouble getting a 'small' enough machine now. If the cost is right for a lightweight desktop, I'll stop buying Intel...Not to mention the 'pat on the back' I'll get for reducing the monthly power bill.
Still drooling for another Alpha at my site..
My other car is a motorcycle!
I think that, given that the chip hosts some of it's functionality in software, writing to the VLIW native set wouldn't improve things because it still needs to be massaged by the code morph software, ie the core can't run "everyday" software without going through code morph, ipso facto.
For instance, you go through all the trouble to compile for the VLIW ISA and then, given the nature of the chip, it can't even run your binary directly on the core anyhow, it has to go through the morpher to enjoy 100% of the advantages of the architechture.
At that point you might as well use the most (commercially) successful ISA ever, x86 which they have worked hard to optimize the code morpher for, anyhow.
"Sig free in '03!"
The PowerPC architecture was designed primarily by IBM and Motorola and the specs have been publicly available for years.
I can just imagine it now... an Open Source instruction set for a virtual processor running only on Crusoe (under Linux of course.)
My only question here is whether or not the instruction set that the game/application is compiled against makes a large difference in the ultimate performance. My guess is that having a specific instruction set that is designed specifically for a given purpose (i.e. fast games) would boost performance tremendously.
************************
This .sig space for rent
************************
There is alot of glitzy information now available about Crusoe VLIW, a core instruction set that is nothing like x86 and the code morphing software. But the actually technical nitty gritty seems to be lacking. Can a program get access to the core instruction set thus bypassing the code morphing? Is it possible to detect the Crusoe processor with x86 compatible instruction so that in critical performance sections of an application Crusoe specific/pre-morphed code can be run if the Crusoe is detect but the application still can execute standard x86 code if it isn't detected? Can a programmer provide their own code morph software thus turning Crusoe into a fast Z80 for example? Does Transmeta have plans to code morph other instruction sets like PPC? And does "Linux Mobile" contain any Crusoe specific instructions or does it depend complettely on the software code morph of x86?
PPC chips arent really aimed at the mobile market. I want to see Crusoe vs StrongARM.
-Yarn - Rio Karma: Excellent
Now I suppose Transmeta could design a full O-O-O core, but I don't see the point. If the software does a good job, the additional flexibility they gain to change the underlying machine is worth it.
As far as branches go, yes, you usually can guess a backward branch is going to be taken. But branches are still a huge problem. It's tough to keep a processor core fed. And don't even get me started on multiple branch prediction. The hit rate goes way down. A study was done here that showed processors today (or in the near future) spend about half the time recovering from branch mispredictions. That's a lot of wasted work. While the code morphing software can't do a perfect job, it is somewhat easier to tune the chip. And then think about per-application tuning. Load a different set of rules depending on the program you're running.
Interesting, no? :)
--
- This architecture allows for some interesting optimizations not feasible in conventional CPUs.
- High performance on the desktop is also interesting: "So you see, they made the Code Morphing software extremely modular. They can implement whatever parts of it they like in hardware to get whatever degree of performance gain they want. Crusoe should be viewed more as a proof of concept than as the ultimate outcome of 5 years of work. Crusoe represents one extreme of a spectrum that stretches from "implement the bare minimum in hardware" to "implement everything in hardware." Now that Transmeta has a technology that's proven to work in the most difficult case (where 2/3 of the transistor logic has been moved into software), they can go back in the other (easier) direction and start putting stuff in silicon.
I, for one, am really excited about the possibilities."Crusoe's Code Morphing software not only keeps track of which blocks of code execute most often and optimizes them accordingly, but it also keeps track of which branches are most often taken and annotates the code accordingly. That way, Crusoe's branch prediction algorithm knows how likely a branch is to be taken, and which branch it should speculatively execute down. If a branch isn't particularly likely to go one way or the other, then Crusoe can speculatively execute down both branches.
Contrast this with speculative execution done on a normal CPU, where hardware limitations like buffer and table sizes limit the amount of information you can store about a particular branch and its execution history. Since Code Morphing keeps track of the branch histories in software, it can record a more finely grained description of the execution patterns of a wider window of code, and therefore assess more accurately whether or not a specific branch is likely to be taken."
Furthermore, since there's a software layer between the ISA of the binary and the machine's native ISA, Transmeta is free to beef up the execution engine (or any other part of the core) however they like, because the only thing that will require a recompile is the Code Morphing software. A case in point is the two chips in its product line. Each has a slightly different core (the Windows chip has special instructions in it that help speed up Windows), but they both are fully x86 compatible. There's nothing to keep them from stuffing new functions and features (SIMD anyone?) into the silicon, to help scale the product has high up as they want to go with it.
I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait."
--
My opinions may have changed, but not the fact that I am right.
****Gfx Scrollbar Special case hit!!*****
I'm curious to know how OSes will handle this. For example, we've already had a thread on the linux-kernel list about timing loops being thrown off by this for existing laptops (because the bogomips on which they're based are calculated at boot time). What was the outcome of that thread? Was a solution reached? Will it apply for Crusoe too?
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Correct. Compilers for the AS/400 (and its System/38 predecessor) for the languages in which applications are written generate code for a virtual machine with a very CISCy instruction set; low-level OS code translates that to the native instruction set. (That long antedates Transmeta; as indicated, it dates back to the System/38, which I think came out in the late '70's; IBM needed no technology from Transmeta to do that - binary-to-binary translation is hardly a Transmeta invention.)
It isn't done in exactly the same fashion, in that, on S/38's and AS/400's, the low-level OS code is written in languages that compile (or, for some code, assemble) into the native machine's instruction set, unlike Crusoe, where the only native code that's run is the translation software and the output of the translation software. Also, I don't think the translation on AS/400 is done as dynamically; I think programs are translated in their entirety the first time they're run, and the executable code for the entire program is kept around.
Comparisons to the PowerPC chips.
After all, the Crusoe architecure is not a performance demon aimed at desktops/servers, and it is not aimed at the ultra-low power consuption StrongArm market. But might be suitable for the sorts of applications that embedded PPCs are currently used in...
Steven E. Ehrbar
No, no, no, **YOU** STILL DON'T GET **IT**.
As far as I can tell, the Crusoe processor engine itself is not special. If you are a "talented programmer programming to the bare metal", you might as well program in assembly on another pre-existing chip.
And then as a chip manufacturer, you'll face 20 years trying to ensure your vintage instruction set that those bare metal hackers employed.
You're missing the point.
Take database servers. Oracle, MySQL, Informix, Sybase, Uncle Joes Ultimate Data Thingy... Just about all of them allow access to their data through a standard SQL language.
But... But... but... Wouldn't it just be so insanely cool and fast if I could just direcly access the ISAM structures and indexes and modify disk sectors directly?!?! I fully expect every dedicated DBA and application designer to go to the bare iron to squeeze performance from their data warehouses!
Has that happened? No. Why? Because MOST, EVERY DAY APPLICATION DESIGNERS DON'T "PROGRAM TO THE BARE METAL". It's too complex, intensive, and fruitless a task. Why is Slashdot written in Perl and not assembly? Why isn't Linux 100% x86 assembly?
There is a BIG difference between just a cool hack and maintainable elegance.
Why do we have high level languages? Why do we have abstraction layers? Why?
The Code Morphing is an abstraction layer. Initially, that layer is the x86 instruction set, an arbitrary set of instructions that just happens to currently be widely used. Using Code Morphing, the Crusoe can leapfrog on that wide base of support, while throwing away the hardware architectural garbage traditionally needed to support it.
Back to SQL: Oracle supports SQL for access to data, but beneath, I'll bet you that a lot of the specific operations upon data that those SQL statements fire off has changed ENORMOUSLY over the years. What would have happened had they allowed programmers straight past the abstraction layer? They still would be trying to support that API today, and I bet they wouldn't be as free to rework their server software.
Furthermore, why do we have the DBI module and DBD modules in Perl? To provide a semi-universal abstraction layer across all databases. When one database's API changes for performance reasons, efficiency, whatever, you just change the morphing-- er DBD-- layer to accomodate it.
What is the point of Crusoe then?
Not to provide assembly hackers with a new opcode set to learn and tweak, which 90% of the application design world will never learn or exploit, and therefore will remain voodoo essentially.
The point is to provide an architecture which supports ABSTRACTION LAYERS of assembly opcodes. So Transmeta is free to vary the underlying hardware in any exotic or esoteric form they see fit, throwing backwards compatibility of their VLIW opcodes to the wind because the Code Morphing allows the SAME ABSTRACTION LAYER API to be exposed to the application designer.
Now, finally, note I keep saying 'application designer'. This is as opposed to 'dedicated hacker'.
Read the definition of a hack. The first two definitions are not my idea of elegance. Something that's quick and does the job but not well. Or, something that is incredibly good, but took a long time.
Now, read the definition of elegant. Something that combines simplicity, power, and grace. Something that is understandable, almost obvious in its expression. Something maintainable.
Tell me what's more maintainable: Assembly code for the Mx-650938 processor, or Java code. It's a close call, but I'll have to go with the Java code. It's harder to write a hack in Java, than it is to create an elegant design in assembly.
It's not about performance. We haven't even BEGUN to wring the performance from the chips we have-- and why? because it's not humanly possible for every applications designer to be a brilliant assembly hacker, which is why we have compilers!
So, finally, why spend your time learning the latest opcode set when you can just focus on a higher level language and leave the hand tweaking and performance tweaks to the man behind the curtain of the Code Morphing abstraction layer of OZ?!?!?!
One of the things that Crusoe supposedly does is it caches frequently used code in its "compiled" form. This means that you only take a performance hit the first time you run it, and then it should run pretty much at full speed.
If they give you access to the underlying architecture, then they are committed to keeping that architecture in future versions. This way they can make up a new ISA for every chip, and just tweak the code morphing layer to make it work.
This gives them a performance hit now, but as Intel is forced to continue to support the x86 architecture in hardware for every new chip, they will have to make their chips ever bigger and ever hotter. Transmeta's approach will likely prove superior in the long run.
Because writing a new code morpher for this architecture would take R&D dollars that would be better spent emulating real like PPC or IA64 architectures with existing user bases. The small increase in performance you'd get from a "native" ISA would not justify the additional costs of writing and supporting the software for it.
Also, it sounds like they are optimising each chip to specifically support code morphing from a specific architecture. That means that x86 *is* a reasonably efficient instruction set for this particular hardware. Yes, you could probably make a faster one, but the gains would be marginal unless you actually got direct access to the underlying ISA, which defeats the whole purpose of this strategy.
It seems a lot of posters are thinking the same thing. But...
You could say the same about a Celeron/P-III/Athlon/Whatever.
"I wonder how much faster my Athlon would go if I could rip out the silicon that does the intruction decoding / reordering / branch prediction / etc and code directly for the execution units."
It probably wouldn't go much faster (I'd guess that silicon does it's job pretty well) but by ripping out all those transistors you could significantly reduce power consumption.
In fact, if you think it through for five years or so you'll probably wake up one day and find you've re-invented Crusoe. Of course it'll be old news by then.
like the Ars article, it was well written. I think the Crusoe is impressive because it does what RISC was originally concepted to do. Look at MIPS, it's a RISC architecture yet it has some of the most complex processing units you'll find. Things like Crusoe and MAJC really rattle the cages of other chip makers because they take an entirely different approach to the chip design. Even PPC is getting really complex, especially by adding the AltiVec unit onto the die, while it improves performance in come calculations it adds signifigantly to the price and complexity of the chip. The human brain can calculate some pretty complex things yet it's processing is done in a massive amount of simple processes rather than a small number of complex ones. I think the next generation of super computers will be built a little more like Crusoe chips, maybe even using Crusoes. The more times it works a calculation the faster it does it, this would add phenominal performance to alot of things we use super computers for right now. Maybe in the next ten years we'll see desktop teraflop systems.
I'm a loner Dottie, a Rebel.
Ahem...
IF YOU WANT TO CODE DIRECTLY TO A VLIW CORE BUY A &*$#ING MERCED!!!!!!
Sorry about that. You lot just aren't getting it. If you remove the code morphing layer, then you have to put backwards compatibility into the hardware down the road. That means lots o' transistors and high power consumption 2 or 3 years down the road. That also means that compiler complexity goes up dramatically. So you'll wind up having a crippled architecture and low quality compilers 10 years down the road. That's stupid. Additionally, if the compiler is entirely responsible for the optimization, you lose the niftly on-the-fly code tuning based on actual runtime data -- this is the coolest thing about the Crusoe.
--Shoeboy
I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if here isn't room onchip for multiple instruction sets.
My understanding from the articles I have read is that maybe, eventually, but right now it only emulates x86.
My whole point was that branch prediction can be replaced by expicit pre-branch notification.
Branch prediction now is very stupid. Circuits try to guess, in real time, which branch will be taken. If the C compiler explained to the branch "predictor" that "this will loop 27 times, then stop looping".
Furthermore, explicit cache requests could be compiled. "I'll stay in this function for a while, but I'm also going to call these functions."
With profile-based optimizations and careful design you might never have a cache miss or a branch misprediction.
I've gotta get me one of these, and play around with alternative opcode sets. This is just the coolest toy for exploring computer architecture.
One would not write in the native VLIW - one would create a new instruction set that hid the VLIW, but used its best features, and interacted with the hardware better - ie saved the optimizations and the branch predictions for the next time the program is run. One need not write in VLIW to get rid on the x86 instruction set. ( I wonder if one could design an instruction set to run one's favorite operating system ( linux - *bsd ... )
why not save the cache to permanent storage. The processor optimizes the code and then saves the optimized code to disk as a "shadow" executable. The next time the program is loaded the OS would indicate that it has already been optimized and pass the shadow to the processor which could bypass the translator. The translator could attach a signature to the shadow, and if it didn't agree it would reload the program and translate from scratch. In this way, you would get permantly optimized code for all your programs while retaining the flexibility of the current design.
Of course, one problem with this would be getting support for shadow programs built into the OS. I wonder if Transmeta has anyone that could handle this?
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed.
The only way "code morphing" could run faster than native code is by exploiting runtime information to perform optimizations that are not possible at compile time. In other words, self-modifying code that runs faster than static code.
This is plausible, but that doesn't mean there would be no performance benefit in compiling native code. Research on self-modifying code is not unique to Crusoe---it's a very active area of research, and there are two major kinds: JIT and dynamic compilation. JIT, which you're probably all familiar with from Java, involves translationg code (typically from a foreign instruction set) and performing optimizations at runtime; dynamic compilation involves "staging" code at compile time to modify itself in a disciplined manner at runtime. JITs and dynamic compilation are very different in the nature of optimizations they perform; one of the major differences is that because dynamic compilation performs its analysis at compile-time, it can theoretically perform much deeper and more sophisticated optimizations.
Crusoe does no staging (it can't: it executes fully precompiled code), so its optimizations operate under severe time constraints. Therefore, Crusoe's code morphing is likely to produce code optimality akin to that emitted by a JIT compilation system: shallower analysis, shallower optimizations. Which almost certainly makes Crusoe's "code morphing" worse than native staged dynamic compilation would be.
In summary: my point is that self-modifying native code that improves its performance at runtime is entirely possible without "code morphing". On the other hand, binary x86 compatibility is arguably Crusoe's major selling point, so there's not much impetus for them to bother encouraging any kind of native code compilation. Anyway, I get the impression that Crusoe's entire architecture would have to be revamped if they wanted to run native code so it's a moot point.
If you're thoroughly confused by now, try visiting the dynamic compilation project at the University of Washington for more information on dynamic compilation.
~k.lee
(BTW: this does not mean that Crusoe does not embody any technical innovations. In particular, the hardware support the chip provides for its runtime code translation is very interesting.)
(remove nospam for email)
You really, really still don't get it, do you? Firstly, Crusoe is the first chip Transmeta has got out the door. It's the simplest possible silicon, with the hard bits done in software. But there's no hard line between what functions can be done in hardware and what can be done in software. It's just that software is cheaper to tune.
When Transmeta have got code-mophing tuned the way they like it there is nothing to stop them releasing a new chip with the code-morphing engine in hardware.
But even if they don't, the limitation on performance computing design is cooling, as Cray amply showed. Crusoe consumes 1/32 the power of your PIII; so, for a given cooling system, you can stick 32 Crusoes in the same box. If each Crusoe gives you 66% of the compute power of the PIII, you've got a box which is going to deliver you more than 21 times the number of polygons your PIII can push.
One thing I haven't yet seen quoted is the part-price for a Crusoe, but if the silicon is as simple as people are suggesting the part-price could be very low - small dies have relatively lower reject rates because if you have one flaw per square inch, every inch square chip has a flaw whereas only one in ten 0.3 inch chips does.
By contrast your PIII is inherently an expensive part - it isn't expensive because Intel are profiteering, it's actually expensive to make. If Transmeta start shipping Crusoes at (say) around $10 per part in quantity, there isn't any way Intel can compete anywhere along the line.
I currently run two PII/300s in my desktop box. I bought them because two 300MHz parts and a motherboard to accomodate them were, at the time I bought them, a lot cheaper than one 500MHz part. If I can get, say, 8 400MHz Crusoes for the price of one 700 MHz Intel part, I will be quite happy to run them, and so I expect will a lot of other people.
Assuming, of course, that Linux 2.4 will run 8-way parallel on Crusoes, but I'm kind of prepared to bet it will :-)
I'm old enough to remember when discussions on Slashdot were well informed.
I'm sorry, I don't get it. Maybe I'm just dense. Why do all this "morphing" and optimizing at runtime, instead of at compile time? Binary compatibility with existing processors is a nice feature, and I'm sure it will help Crusoe get a foothold in the market, but why can't we at least have the option of bypassing the emulation when native software becomes available? (Or does the Crusoe already allow this? The reports haven't been clear on that.)
MSK
I originally posted this in a previous crusoe article but no one commented on whether it's actually feasible or not. Any big brain VLIW gurus want to tell me if what I suspect might actually be true?
The quake3 performance we saw on the ZDTV webcast was pretty damn impressive. Everyone seems to be assuming that they had 3d accelerators in those TM5400 laptops.
You can run quake 3 in software mode under mesa at about 3 frames per second.
But this is transmeta we're talking about and that was Dave Taylor, the SAME dave taylor that once leaked a document onto usenet ranting about
the inferiority of hardware graphics accelerators and that what he really wanted was a generic parallel processing chip that could do arbitary transforms.
GEE, a lot like the crusoe chip can do?
(anyone got the link to that usenet posting on deja that dave taylor tried to cancel?)
Isn't it feasible that they have put hooks into their code morphing software that optimises specially for 3d transforms and mesa/opengl?
Especially in the linux version? Where they have all the source code to linux and mesa?
Hmm, what fancy optimisations could those clever brains come up with?
Maybe those transmeta laptops WON'T need 3d accelerator ships?
And it would completely defeat the purpose of a low power laptop to put a big,hot,power sucking 3d chip in it. So I'm assuming that demo of quake3 they showed WAS running in software mode with some pretty fancy dynamic optimisations going on.
Maybe the reason they didn't make a big deal about this is that it's still a "work in progress" as Linus said about mobile linux so they don't want to hype it yet.
Someone prove me wrong?
Quote:
Early in the next century, Dean hopes his new concoction, which he says is "in the idea and invention stage," will be ready for the public: a sleek tablet that is magazine-size, inexpensive, programmable, and voice-activated. He expects his unnamed dream pad, which will run on a 24-hour battery, to provide everything a PC does, including streaming audio and video, word processing, and spreadsheets. It will even have a port for old fogies who can't give up their keyboards. And it will wirelessly put the Internet and other information at your fingertips.
End Quote.
Of course the article never mentions Transmeta, but I bet this web pad would be powered by Crusoe. Here's the link for the article.
The current instruction sets of most processors are probably designed based on certain price:performance ratios taking the cost of producing them as hardware as a major consideration. Transmeta could come up with their own virtual instruction set that would be optimized for thier chips. It would be an easy move for the software developers since their old code could still run on the processor anyway until they recompile to the virtual instruction set. I didn't read the whole Ars article because it's past my bedtime (I'll read it tomorrow at work.) But the author made a comment about framerates "(yet)" -- I didn't see what he was eluding to by the "(yet)" but I got the impression he expects Transmeta compete beyond the mobile arena.
;)
Another thought I've had is that things just got harder for a company like Intel. It was no easy task for AMD to get big enough where they could afford to be competitive with Intel. But Crusoe-type processors sound like they would be much easier to design and produce...new companies will have a much lower barrier for entry into the competition. Lucky for Transmeta that they have their patents
numb
OK, I'm just an applications geek, and know next to nothing about hardware, so this probably sounds pretty stupid. Live with it.
And the brethren went away edified.
I _know_ what you're saying, I _read_ the Transmeta whitepaper & have a pretty good idea of the concepts behind the Code Morpher, I _know_ what how the Transmeta people _want_ the chip to be used, and how a lot of people think it _should_ be used - just as I _know_ that there are going to be some people who will ignore all that & will hack on the VLIW instruction set directly. 99.9% of the people programming for the Transmeta chips won't - but there will be a few that will.
They won't give a damn about backward compatibility, or what the "next" chip is going to implement - they're not programming for money, they're programming for fun, and they'll program using the VLIW instruction set because they'll think they can do it better than the Code Morpher can (for a particular chip, and a particular set of instructions). When they start playing with a new chip, they'll learn the VLIW instruction set for THAT chip and do it all over again.
BTW, regarding some of the replies:
1. "Transmeta's chips transcend backwards compatibility."
Bull.
Transmeta have to create versions of the Code Morpher to be "backwards compatible" with all of the various instruction sets that they choose to support from the other chip companies, plus any "improvements" to the instruction set that those chip companies make. They will have to create a Code Morpher version to run on each new chip that they develop. (Can you say, front-end/back-end?)
If they did a good job architecturally, and make it easy to upgrade the Code Morpher (assumedly in FlashROM or something similar), then given the current processor-types, it shouldn't be too difficult for them to create new front-ends and back-ends.
As time goes on, like any project, the Code Morpher code base will get more complicated & difficult to maintain. They'll make mistakes encoding the instruction sets, and then have to issue updates to correct it, etc.
2. "Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing."
By definition, a "perfect programmer" will always be able to do AT LEAST AS WELL as an optimizing compiler (even at run-time!), because he or she can USE THE SAME TRICKS as the optimizing compiler (write code which collects metrics & recreates itself based on those metrics). And because the programmer has application knowledge which the compiler doesn't, he or she will mostly likely be able to DO BETTER.
Like I said before: for the most part, programmers will use what Transmeta gives them - and for a very small fraction of programmers, in the tiny bits of their code where they want to squeeze out everything they can from the hardware, they're going to try to bang on the metal.
Based on the strong reaction to my reply, I'd say that at least a few people have been programming for a living so long, they've forgotten how much fun it is to "push the envelope" of any given piece of hardware.
I'm sorry, YOU aren't getting it.
No matter how good the Code Morpher is, a talented programmer programming "to the bare metal" will be able to do better. A geek screaming for performance on their "baby" doesn't give a damn about whether the next processor will change its instruction set - he (or she) is interested in getting the max. performance out of the CURRENT processor - which DOESN'T mean you let somebody else's software get in the way.
As far as on-the-fly code tuning is concerned, no matter how good the "tuner" is, it can only react to changes & build code AFTER it has accumulated some metrics, whereas a programmer who is intimately familiar with his or her problem-space, can prebuild tuned code for handling most of their expected cases.
I fully expect dedicated hackers to do what every programming freak does - use the provided tools most of the time, and where they want total control & performance, to write the VLIW directly (no matter WHAT the people who made the chip say).
Frankly, ignoring all the hype, this is just a RISCier RISC chip - what the original RISC folks were aiming for in the first place, but which has fallen by the wayside as they tried to compete with Intel.
There are several reasons why Transmeta doesn't want people coding for the native instruction sets. First of all, coding for a native instruction set will just give us the same problem as we have with x86 now -- too many applications to change the architecture, so crappy architecture ends up hanging around way longer than it should. Second, they stated that the instruction sets for the two chips are incompatible, so obviously there is no single "Transmeta Instruction Set". Third, they like the code morphing because it allows them to make fixes that can be downloaded. If people are coding apps to run natively, this can't be done.
But......
I have been thinking about this too and I'm wondering if it would be possible/logical to define some VLIW Instruction Set that could be used on all Transmeta chips, but would be faster and more efficient than translating x86. The CMS would still be translating from the "Transmeta Instruction Set" to the chip's native instruction set, so they could keep all the benefits as before.
Whadyall think?
pdubroy AT yahoo DOT com
In the article there's this paragraph: Now, let me just stop and say that a number of folks, in their effort to show that they've "seen it all before" and can't be taken in by the hype, have tried to compare Code Morphing to Alpha's FX!32 or to an emulation program like SoftWindows. Such comparisons are like comparing a MinuteMan missile to a bottle rocket. In this case, you should feel free to believe the hype; Code Morphing is cool. I'd have to say that code morphing has been around. One only needs to look at executor from ARDI. It dynamically recompiles 68k code into x86 code using an instruction generator. i think ardi has a whitepaper on this on their site. Besides that, there's not that much difference between FX!32 and code morphing from the software perspective except for the fact that Crusoe had more hardware support of fixups (via the shadow register file and the gated store buffer), FX!32 runs offline instead of dynamically, and the threshold for code generation is much higher (FX!32 translates based on profile info, Crusoe probably only translates when they're enough blocks to make the translatation overhead worthwhile.) In addition, there *has* been work doing dynamic recompilation. That's essentially what a JIT is. Or you can look at a paper in the 1998 ASPLOS proceedings. There's a paper there describing such a system (called Shogun, I think), unfortunately, the target arch didn't have all the crusoe's aforementioned hardware hooks, so the performance isn't quite as high. Even VMware has done this stuff before, well VMware started off as simOS, which did have a dynamic translation as well as interpreted mode. Its just that no one has integrated the translator and added the hardware hooks to make it as efficient.
.oOo.
can we run a Beowulf cluster with it? =-)
Seriously though. The biggest problems with Beowulfs is space and heat, and imagine low-heat low-space processors wedged in there. Makes me horny.
From the mind of the most famous poster in all of slashdot
...is how much faster this thing will run if it's not emulating an x86.
:-)
That is missing the point, IMHO. One of the reasons the chip kicks ass is because they can change the hardware and you can't tell. Write native VLIW on this pig and you're fucked if they change, just like all the other processors.
... this is coming from a guy who prefers assembly to high-level languages in 98% of cases. I think they really struck on something here, don't fuck it up by asking to write in the "native tongue" of this beast. Well, unless you're writing your own processor.
Okay. The Crusoe is fully x86 compatible. Great. But how about developing applications for this processor that skip the translation step, and are already written in the processor's native language? Think about a Distributed.net client written SPECIFICALLY for this processor, with no x86 instructions.....
I'm betting that would speed up apps tremendously. Even Linux....ported directly to Crusoe's native instruction set. The problem I see is, the processor is designed to run x86 out of the box. Code would have to be written to change the Flash ROMs on the processor to bypass translation and hit the core directly, or at least do a straight-through delivery. (Why translate VLIW to VLIW?)
(IF YOU DO THIS AND FRY YOUR CRUSOE, I'M NOT LIABLE.)
-- Give him Head? Be a Beacon?
-- Give him Head? Be a Beacon? :P)
(If you can't figure out how to E-Mail me, Don't.
Who cares about Transmeta Beowulf's. With the low transistor count and low temp, this chip could do the same SMP-on-a-chip thing that IBM is planning for the PPC. The only reason to have beowulf at all is that it's more economical than SMP sytems, it's not a better solution than massive SMP IF massive SMP can be made cheaply. Of course, some organizations will have a need for beowulf clusters of massively SMP systems...
...damn it, now I'm horny.
--Shoeboy
I have some concerns about the performance that the Cruose processors will actually have. The article mentions that translated instructions will be cached and then be reused if the CodeMorph software sees it again. However, it seems like the CodeMorph's state information will not be mantained between runs. If you power off the computer, the software loses the cached information and has to start from scratch again. In addition, the cache's size or location isn't given. Is it a small cache on die or is it located in system memory? The cache is probably on die for speed reasons but this would limit the size of the cache. This could be a performance hit since the cache is also used as a data cache and instruction cache.
Another question concerns the way the instructions are being cached. For example suppose the following instructions were given
ADD AX, BX
SUB CX, AX
JNZ Cx
Would the translation for each instruction be cached, or is the sequence cached? The article implies that the sequence is cached since the CodeMorph software can optimize the speed on subsequent passes. However, this seems to limit the benefit gained from caching to relatively tight loops or common sequences of code depending on the cache size.
On a side note, the article implies that the CodeMorph software lightyears beyond anything else. However, some of its highly touted features appeared in other software before. For example, DEC's FX!32 would initially just translate code but would also observe the application behaviour and then optimize the code based on that after the application finished executing. It could do this optimization several times, optimizing more aggressively on each pass. Also Apple's 680x0 emulator was also based in rom that would start up initially so what the MacOS could boot. The CodeMorph software has some new features if it really does OO scheduling and optimization on the fly but that seems like a pretty big hit on performance.
If future server/desktop oriented processors implement large parts of the CodeMorph software in hardware, how will that be any different than AMD or Intel's processors since they'll all be implementing a hardware instruction translation unit besides the Transmeta core being VLIW. Plus the transistor count and power consumption will also sky rocket along with that.
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
Transmeta does NOT want us programming directly in Crusoe VLIW-native code. In fact, the opcodes will NOT be the same on the 3400/5400 chips, and will probably change for all future chips (each model/variation would need its own code morphing software).
The primary reason is that they don't want to have to make these chips backwards compatible. Intel has a lot of problems with this - even the newest Pentium III's must support programs written for 386s. Intel has a hard time because it can't change these opcodes, but instead has to add new ones - hence MMX, SIMD instructions, the Katmai extensions (the P3 stuff), etc (and similarly, AMD has added 3dnow! et al).
Transmeta wants the freedom to be able to drastically change newer models of the CPU to keep it running at optimal speed/efficiency. If they wanted to allow us to write Crusoe-native code, then they'd need morphing software that allows newer models to morph old code to its own (modified) native code. In other words, a real pain in the rear and definately a problem if Crusoe can't run different "morphers" simultaneously (which I suspect it can't).
As for other morphing software to emulate other processors: I wouldn't be surprised if they allowed it to emulate some other chips - like the PPC, so it can run MacOS stuff - but it won't run nearly as well as x86 emulation will. The chip is meant to be able to morph code from many different platforms, but there are a lot of shortcuts to emphasize x86. I think that topic is addressed in the Ars Technica stuff, but basically Crusoe uses a FPU very similar to the x86 one. I think there are some other things for that in hardware, as well as the fact that we know they're dedicating most of their time to creating the x86 morphing software so it will be the most optimized.
I highly doubt that we'll be able to write our own morphers. I think that it's an extremely difficult thing to do, it would require knowledge of the Crusoe instruction set (which, as I said above, they don't want to release), and the morphing software is probably authenticated somehow. Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.
Linus said that "Mobile Linux" is NOT a code fork - it's just the x86 version with a few modifications to make it run better on embedded platforms. Why reinvent the wheel?
Keep in mind that this is all SPECULATION - if anyone here has other information to the contrary, I'd like to hear it =)
-- Imagine how much more advanced our technology would be if we had eight fingers per hand.
...is how much faster this thing will run if it's not emulating an x86. It looks pretty hot under the hood, and if, instead of using standard guess-aheads, you can tell it which branch to use as default or even tell it about branches ahead of time (which you often know well before the actual conditional looping operation) so it's not guessing at all.
There's of all kinds of fun I could have with this chip...
I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if there isn't room onchip for multiple instruction sets.
They have essentially built a Japanese Compact Car that is fuel efficient, and not an Italian sports car.
Efficiency isn't exactly exciting. Unless I am using a Palm Pilot, I really don't care if my PentiumIII or Alpha is sucking 34W and my Nvidia GeForce is sucking another 30. What I care about is how fast my performance is. How many transactions can I run? How many frames per second am I getting? How many polygons can I push?
Crusoe may be important for the coming ubiquitous computing revolution (if it ever happens), but they are not the first to go after low power (remember Rise? Remember WinChip IDT? Don't forget Strong ARM)
I think Crusoe is a nice chip, but the *HYPE* (and I mean hype) caused by deliberate secrecy and press leaks thoroughly destroyed any chance of it being seen as revolutionary in my eyes.
The Code Morphing technology is not revolutionary. Emulators have been doing dynamic instruction set recompilation for years now, DEC did it with FX32, Sun does it with Java JIT's (including HotSpot which does recompilation based on runtime profiles), SmallTalk VM's have been doing it, hell, even one of the Commodore 64 emulators does it if I recall. John Carmack's Quake3 engine even does it. I'm sure there are hundreds of projects in Academia that have been doing it. The only relevent difference is the hardware assist that the Crusoe has.
Chances are, when you hype something too much, it's going to be disappointing. There's a thread on Usenet that claims Transmeta's *ORIGINAL* goal was not low power, but the best performance, but when they couldn't attain it, they "fell back" to a low power selling point. I think it's in comp.arch.
That's the whole point of Crusoe, you DON'T code for it directly. It takes other instuctions, starting with x86, and runs them faster, better, and optimizes on the fly.
The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed. Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.
Also, the instruction sets are different for each chip. Each set is further optimized for what it's use is going to be. So if you code for one Crusoe chip natively , it doesn't run on the other. This lets Transmeta change the instruction set as needed to. Like if it's faster to do something one way, they can change it and not break compatability with anything. And they can give you the update with a software patch.
So, it doesn't matter if people don't have the instruction set for the native Crusoe processors. They will change alot, and everytime they change you would have to recode every program again. Why bother? Also you don't get to use what the Crusoe processor is all about, it's code morphing layer.
So, PLEASE, stop complaining that you can't code natively for this chip. The code won't go any faster, and as soon as Transmeta changes the set, your programs wouldn't run anyways. So it's a moot point to code navitly for it.