Ars Technica Gets Into Crusoe
redmist writes "Ars Technica has a great, in depth article about the new Crusoe chips. Enjoy." This one will answer most of the questions I've heard about Crusoe's guts, and how it differs from other microprocessors. "Must" reading for all hardware junkies!
There is alot of glitzy information now available about Crusoe VLIW, a core instruction set that is nothing like x86 and the code morphing software. But the actually technical nitty gritty seems to be lacking. Can a program get access to the core instruction set thus bypassing the code morphing? Is it possible to detect the Crusoe processor with x86 compatible instruction so that in critical performance sections of an application Crusoe specific/pre-morphed code can be run if the Crusoe is detect but the application still can execute standard x86 code if it isn't detected? Can a programmer provide their own code morph software thus turning Crusoe into a fast Z80 for example? Does Transmeta have plans to code morph other instruction sets like PPC? And does "Linux Mobile" contain any Crusoe specific instructions or does it depend complettely on the software code morph of x86?
PPC chips arent really aimed at the mobile market. I want to see Crusoe vs StrongARM.
-Yarn - Rio Karma: Excellent
Now I suppose Transmeta could design a full O-O-O core, but I don't see the point. If the software does a good job, the additional flexibility they gain to change the underlying machine is worth it.
As far as branches go, yes, you usually can guess a backward branch is going to be taken. But branches are still a huge problem. It's tough to keep a processor core fed. And don't even get me started on multiple branch prediction. The hit rate goes way down. A study was done here that showed processors today (or in the near future) spend about half the time recovering from branch mispredictions. That's a lot of wasted work. While the code morphing software can't do a perfect job, it is somewhat easier to tune the chip. And then think about per-application tuning. Load a different set of rules depending on the program you're running.
Interesting, no? :)
--
- This architecture allows for some interesting optimizations not feasible in conventional CPUs.
- High performance on the desktop is also interesting: "So you see, they made the Code Morphing software extremely modular. They can implement whatever parts of it they like in hardware to get whatever degree of performance gain they want. Crusoe should be viewed more as a proof of concept than as the ultimate outcome of 5 years of work. Crusoe represents one extreme of a spectrum that stretches from "implement the bare minimum in hardware" to "implement everything in hardware." Now that Transmeta has a technology that's proven to work in the most difficult case (where 2/3 of the transistor logic has been moved into software), they can go back in the other (easier) direction and start putting stuff in silicon.
I, for one, am really excited about the possibilities."Crusoe's Code Morphing software not only keeps track of which blocks of code execute most often and optimizes them accordingly, but it also keeps track of which branches are most often taken and annotates the code accordingly. That way, Crusoe's branch prediction algorithm knows how likely a branch is to be taken, and which branch it should speculatively execute down. If a branch isn't particularly likely to go one way or the other, then Crusoe can speculatively execute down both branches.
Contrast this with speculative execution done on a normal CPU, where hardware limitations like buffer and table sizes limit the amount of information you can store about a particular branch and its execution history. Since Code Morphing keeps track of the branch histories in software, it can record a more finely grained description of the execution patterns of a wider window of code, and therefore assess more accurately whether or not a specific branch is likely to be taken."
Furthermore, since there's a software layer between the ISA of the binary and the machine's native ISA, Transmeta is free to beef up the execution engine (or any other part of the core) however they like, because the only thing that will require a recompile is the Code Morphing software. A case in point is the two chips in its product line. Each has a slightly different core (the Windows chip has special instructions in it that help speed up Windows), but they both are fully x86 compatible. There's nothing to keep them from stuffing new functions and features (SIMD anyone?) into the silicon, to help scale the product has high up as they want to go with it.
I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait."
--
My opinions may have changed, but not the fact that I am right.
****Gfx Scrollbar Special case hit!!*****
Regardless, it will have to be turned off even if it is for battery changes.
Not to be an ass, but my Palmpilot doesn't lose its data when I switch batteries. Nor do (new) VCRs lose their programming on power loss. Devices called super capacitors (5V 1F style things) keep enough energy around to keep very low power components up and running in a sleep mode to ride out such interruptions.
I'm curious to know how OSes will handle this. For example, we've already had a thread on the linux-kernel list about timing loops being thrown off by this for existing laptops (because the bogomips on which they're based are calculated at boot time). What was the outcome of that thread? Was a solution reached? Will it apply for Crusoe too?
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Only if Apple (who didn't invent the PowerPC instruction set; it's a derivative of the IBM POWER instruction set) have some form of intellectual-property rights for the instruction set.
If there are any such rights owned by Somerset, Apple might also have some say in licensing it.
"The specs", in the sense of the instruction set specifications for PowerPC, are publicly available, although if the chip+software is intended to look like a particular PowerPC chip (I think the MMUs may differ, e.g. may have software TLB reload on some processors and hardware TLB reload on others), they'd need that spec as well (I think the specs for various PowerPC chips are also publicly available).
If somebody wanted to clone not only some PowerPC CPU but a support chip set for it, so that they could run OSes such as MacOS unmodified, that stuff might have to be changed...
...but that's just cloning a Mac, which Apple isn't allowing even if you use existing PowerPC chips.
Of course, there is the possibility that Apple would want to use a Transmeta chip in a Powerbook, say, in which case the Apple licensing issues go away.
Apple are unlikely to be the ones to block such a Code Morphing(TM)(R)(LSMFT) layer; they don't, as far as I know, have a problem with people building non-Mac-compatible PowerPC machines, and they already have, as far as I know, the tools to block people from building Mac clones.
"Apply them all to hardware" in what sense? The binary-to-binary translators for Crusoe chips are software; they just happen to be running on hardware that offers some assistance, but the translation itself isn't done by hardware (and happens at a layer below even the lowest-level OS code; as far as the OS is concerned, all the way down to the lowest level, the chip looks like an x86).
Correct. Compilers for the AS/400 (and its System/38 predecessor) for the languages in which applications are written generate code for a virtual machine with a very CISCy instruction set; low-level OS code translates that to the native instruction set. (That long antedates Transmeta; as indicated, it dates back to the System/38, which I think came out in the late '70's; IBM needed no technology from Transmeta to do that - binary-to-binary translation is hardly a Transmeta invention.)
It isn't done in exactly the same fashion, in that, on S/38's and AS/400's, the low-level OS code is written in languages that compile (or, for some code, assemble) into the native machine's instruction set, unlike Crusoe, where the only native code that's run is the translation software and the output of the translation software. Also, I don't think the translation on AS/400 is done as dynamically; I think programs are translated in their entirety the first time they're run, and the executable code for the entire program is kept around.
Yes, it's GPLed.
Where is the source? Read the GPL -- they don't have to release the source until they distribute the code. Mobile Linux hasn't been released yet, so they can sit on the source for now. Linus has promised it will be available RSN.
Steven E. Ehrbar
Comparisons to the PowerPC chips.
After all, the Crusoe architecure is not a performance demon aimed at desktops/servers, and it is not aimed at the ultra-low power consuption StrongArm market. But might be suitable for the sorts of applications that embedded PPCs are currently used in...
Steven E. Ehrbar
No, no, no, **YOU** STILL DON'T GET **IT**.
As far as I can tell, the Crusoe processor engine itself is not special. If you are a "talented programmer programming to the bare metal", you might as well program in assembly on another pre-existing chip.
And then as a chip manufacturer, you'll face 20 years trying to ensure your vintage instruction set that those bare metal hackers employed.
You're missing the point.
Take database servers. Oracle, MySQL, Informix, Sybase, Uncle Joes Ultimate Data Thingy... Just about all of them allow access to their data through a standard SQL language.
But... But... but... Wouldn't it just be so insanely cool and fast if I could just direcly access the ISAM structures and indexes and modify disk sectors directly?!?! I fully expect every dedicated DBA and application designer to go to the bare iron to squeeze performance from their data warehouses!
Has that happened? No. Why? Because MOST, EVERY DAY APPLICATION DESIGNERS DON'T "PROGRAM TO THE BARE METAL". It's too complex, intensive, and fruitless a task. Why is Slashdot written in Perl and not assembly? Why isn't Linux 100% x86 assembly?
There is a BIG difference between just a cool hack and maintainable elegance.
Why do we have high level languages? Why do we have abstraction layers? Why?
The Code Morphing is an abstraction layer. Initially, that layer is the x86 instruction set, an arbitrary set of instructions that just happens to currently be widely used. Using Code Morphing, the Crusoe can leapfrog on that wide base of support, while throwing away the hardware architectural garbage traditionally needed to support it.
Back to SQL: Oracle supports SQL for access to data, but beneath, I'll bet you that a lot of the specific operations upon data that those SQL statements fire off has changed ENORMOUSLY over the years. What would have happened had they allowed programmers straight past the abstraction layer? They still would be trying to support that API today, and I bet they wouldn't be as free to rework their server software.
Furthermore, why do we have the DBI module and DBD modules in Perl? To provide a semi-universal abstraction layer across all databases. When one database's API changes for performance reasons, efficiency, whatever, you just change the morphing-- er DBD-- layer to accomodate it.
What is the point of Crusoe then?
Not to provide assembly hackers with a new opcode set to learn and tweak, which 90% of the application design world will never learn or exploit, and therefore will remain voodoo essentially.
The point is to provide an architecture which supports ABSTRACTION LAYERS of assembly opcodes. So Transmeta is free to vary the underlying hardware in any exotic or esoteric form they see fit, throwing backwards compatibility of their VLIW opcodes to the wind because the Code Morphing allows the SAME ABSTRACTION LAYER API to be exposed to the application designer.
Now, finally, note I keep saying 'application designer'. This is as opposed to 'dedicated hacker'.
Read the definition of a hack. The first two definitions are not my idea of elegance. Something that's quick and does the job but not well. Or, something that is incredibly good, but took a long time.
Now, read the definition of elegant. Something that combines simplicity, power, and grace. Something that is understandable, almost obvious in its expression. Something maintainable.
Tell me what's more maintainable: Assembly code for the Mx-650938 processor, or Java code. It's a close call, but I'll have to go with the Java code. It's harder to write a hack in Java, than it is to create an elegant design in assembly.
It's not about performance. We haven't even BEGUN to wring the performance from the chips we have-- and why? because it's not humanly possible for every applications designer to be a brilliant assembly hacker, which is why we have compilers!
So, finally, why spend your time learning the latest opcode set when you can just focus on a higher level language and leave the hand tweaking and performance tweaks to the man behind the curtain of the Code Morphing abstraction layer of OZ?!?!?!
One of the things that Crusoe supposedly does is it caches frequently used code in its "compiled" form. This means that you only take a performance hit the first time you run it, and then it should run pretty much at full speed.
If they give you access to the underlying architecture, then they are committed to keeping that architecture in future versions. This way they can make up a new ISA for every chip, and just tweak the code morphing layer to make it work.
This gives them a performance hit now, but as Intel is forced to continue to support the x86 architecture in hardware for every new chip, they will have to make their chips ever bigger and ever hotter. Transmeta's approach will likely prove superior in the long run.
Because writing a new code morpher for this architecture would take R&D dollars that would be better spent emulating real like PPC or IA64 architectures with existing user bases. The small increase in performance you'd get from a "native" ISA would not justify the additional costs of writing and supporting the software for it.
Also, it sounds like they are optimising each chip to specifically support code morphing from a specific architecture. That means that x86 *is* a reasonably efficient instruction set for this particular hardware. Yes, you could probably make a faster one, but the gains would be marginal unless you actually got direct access to the underlying ISA, which defeats the whole purpose of this strategy.
It seems a lot of posters are thinking the same thing. But...
You could say the same about a Celeron/P-III/Athlon/Whatever.
"I wonder how much faster my Athlon would go if I could rip out the silicon that does the intruction decoding / reordering / branch prediction / etc and code directly for the execution units."
It probably wouldn't go much faster (I'd guess that silicon does it's job pretty well) but by ripping out all those transistors you could significantly reduce power consumption.
In fact, if you think it through for five years or so you'll probably wake up one day and find you've re-invented Crusoe. Of course it'll be old news by then.
like the Ars article, it was well written. I think the Crusoe is impressive because it does what RISC was originally concepted to do. Look at MIPS, it's a RISC architecture yet it has some of the most complex processing units you'll find. Things like Crusoe and MAJC really rattle the cages of other chip makers because they take an entirely different approach to the chip design. Even PPC is getting really complex, especially by adding the AltiVec unit onto the die, while it improves performance in come calculations it adds signifigantly to the price and complexity of the chip. The human brain can calculate some pretty complex things yet it's processing is done in a massive amount of simple processes rather than a small number of complex ones. I think the next generation of super computers will be built a little more like Crusoe chips, maybe even using Crusoes. The more times it works a calculation the faster it does it, this would add phenominal performance to alot of things we use super computers for right now. Maybe in the next ten years we'll see desktop teraflop systems.
I'm a loner Dottie, a Rebel.
The best way for games to run would be like nVidia is doing with the GeForce's GPU, the GPU handles the graphics and transforms and all the heavy duty FPU calculations while the system's CPU handles the actual code of the game. The instruction set I would guess would be best for gaming is true RISC, it gets the job done as simple and quick as possible. Games as well as any graphical pose a challenge to processors and programmers because you have two things going on, the data manipulation and control of the program and then the graphical manipulation of the graphics. Look at any CLI programs, they have a single job to do usually at a time and can work in order, Quake needs to do 30 things at once.
I'm a loner Dottie, a Rebel.
Ahem...
IF YOU WANT TO CODE DIRECTLY TO A VLIW CORE BUY A &*$#ING MERCED!!!!!!
Sorry about that. You lot just aren't getting it. If you remove the code morphing layer, then you have to put backwards compatibility into the hardware down the road. That means lots o' transistors and high power consumption 2 or 3 years down the road. That also means that compiler complexity goes up dramatically. So you'll wind up having a crippled architecture and low quality compilers 10 years down the road. That's stupid. Additionally, if the compiler is entirely responsible for the optimization, you lose the niftly on-the-fly code tuning based on actual runtime data -- this is the coolest thing about the Crusoe.
--Shoeboy
Regardless, it will have to be turned off even if it is for battery changes. Also people may drain the battery after using it for a while, requiring a power down. These points aside, even low power devices today are turned off (i.e. laptops, palm pilots, etc.) since even standby mode drains too much power. The crusoe systems will probably drain more power than a palm pilot so you'll probably need to turn it off.
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if here isn't room onchip for multiple instruction sets.
My understanding from the articles I have read is that maybe, eventually, but right now it only emulates x86.
My whole point was that branch prediction can be replaced by expicit pre-branch notification.
Branch prediction now is very stupid. Circuits try to guess, in real time, which branch will be taken. If the C compiler explained to the branch "predictor" that "this will loop 27 times, then stop looping".
Furthermore, explicit cache requests could be compiled. "I'll stay in this function for a while, but I'm also going to call these functions."
With profile-based optimizations and careful design you might never have a cache miss or a branch misprediction.
I've gotta get me one of these, and play around with alternative opcode sets. This is just the coolest toy for exploring computer architecture.
One would not write in the native VLIW - one would create a new instruction set that hid the VLIW, but used its best features, and interacted with the hardware better - ie saved the optimizations and the branch predictions for the next time the program is run. One need not write in VLIW to get rid on the x86 instruction set. ( I wonder if one could design an instruction set to run one's favorite operating system ( linux - *bsd ... )
why not save the cache to permanent storage. The processor optimizes the code and then saves the optimized code to disk as a "shadow" executable. The next time the program is loaded the OS would indicate that it has already been optimized and pass the shadow to the processor which could bypass the translator. The translator could attach a signature to the shadow, and if it didn't agree it would reload the program and translate from scratch. In this way, you would get permantly optimized code for all your programs while retaining the flexibility of the current design.
Of course, one problem with this would be getting support for shadow programs built into the OS. I wonder if Transmeta has anyone that could handle this?
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed.
The only way "code morphing" could run faster than native code is by exploiting runtime information to perform optimizations that are not possible at compile time. In other words, self-modifying code that runs faster than static code.
This is plausible, but that doesn't mean there would be no performance benefit in compiling native code. Research on self-modifying code is not unique to Crusoe---it's a very active area of research, and there are two major kinds: JIT and dynamic compilation. JIT, which you're probably all familiar with from Java, involves translationg code (typically from a foreign instruction set) and performing optimizations at runtime; dynamic compilation involves "staging" code at compile time to modify itself in a disciplined manner at runtime. JITs and dynamic compilation are very different in the nature of optimizations they perform; one of the major differences is that because dynamic compilation performs its analysis at compile-time, it can theoretically perform much deeper and more sophisticated optimizations.
Crusoe does no staging (it can't: it executes fully precompiled code), so its optimizations operate under severe time constraints. Therefore, Crusoe's code morphing is likely to produce code optimality akin to that emitted by a JIT compilation system: shallower analysis, shallower optimizations. Which almost certainly makes Crusoe's "code morphing" worse than native staged dynamic compilation would be.
In summary: my point is that self-modifying native code that improves its performance at runtime is entirely possible without "code morphing". On the other hand, binary x86 compatibility is arguably Crusoe's major selling point, so there's not much impetus for them to bother encouraging any kind of native code compilation. Anyway, I get the impression that Crusoe's entire architecture would have to be revamped if they wanted to run native code so it's a moot point.
If you're thoroughly confused by now, try visiting the dynamic compilation project at the University of Washington for more information on dynamic compilation.
~k.lee
(BTW: this does not mean that Crusoe does not embody any technical innovations. In particular, the hardware support the chip provides for its runtime code translation is very interesting.)
(remove nospam for email)
You really, really still don't get it, do you? Firstly, Crusoe is the first chip Transmeta has got out the door. It's the simplest possible silicon, with the hard bits done in software. But there's no hard line between what functions can be done in hardware and what can be done in software. It's just that software is cheaper to tune.
When Transmeta have got code-mophing tuned the way they like it there is nothing to stop them releasing a new chip with the code-morphing engine in hardware.
But even if they don't, the limitation on performance computing design is cooling, as Cray amply showed. Crusoe consumes 1/32 the power of your PIII; so, for a given cooling system, you can stick 32 Crusoes in the same box. If each Crusoe gives you 66% of the compute power of the PIII, you've got a box which is going to deliver you more than 21 times the number of polygons your PIII can push.
One thing I haven't yet seen quoted is the part-price for a Crusoe, but if the silicon is as simple as people are suggesting the part-price could be very low - small dies have relatively lower reject rates because if you have one flaw per square inch, every inch square chip has a flaw whereas only one in ten 0.3 inch chips does.
By contrast your PIII is inherently an expensive part - it isn't expensive because Intel are profiteering, it's actually expensive to make. If Transmeta start shipping Crusoes at (say) around $10 per part in quantity, there isn't any way Intel can compete anywhere along the line.
I currently run two PII/300s in my desktop box. I bought them because two 300MHz parts and a motherboard to accomodate them were, at the time I bought them, a lot cheaper than one 500MHz part. If I can get, say, 8 400MHz Crusoes for the price of one 700 MHz Intel part, I will be quite happy to run them, and so I expect will a lot of other people.
Assuming, of course, that Linux 2.4 will run 8-way parallel on Crusoes, but I'm kind of prepared to bet it will :-)
I'm old enough to remember when discussions on Slashdot were well informed.
I'm sorry, I don't get it. Maybe I'm just dense. Why do all this "morphing" and optimizing at runtime, instead of at compile time? Binary compatibility with existing processors is a nice feature, and I'm sure it will help Crusoe get a foothold in the market, but why can't we at least have the option of bypassing the emulation when native software becomes available? (Or does the Crusoe already allow this? The reports haven't been clear on that.)
MSK
I originally posted this in a previous crusoe article but no one commented on whether it's actually feasible or not. Any big brain VLIW gurus want to tell me if what I suspect might actually be true?
The quake3 performance we saw on the ZDTV webcast was pretty damn impressive. Everyone seems to be assuming that they had 3d accelerators in those TM5400 laptops.
You can run quake 3 in software mode under mesa at about 3 frames per second.
But this is transmeta we're talking about and that was Dave Taylor, the SAME dave taylor that once leaked a document onto usenet ranting about
the inferiority of hardware graphics accelerators and that what he really wanted was a generic parallel processing chip that could do arbitary transforms.
GEE, a lot like the crusoe chip can do?
(anyone got the link to that usenet posting on deja that dave taylor tried to cancel?)
Isn't it feasible that they have put hooks into their code morphing software that optimises specially for 3d transforms and mesa/opengl?
Especially in the linux version? Where they have all the source code to linux and mesa?
Hmm, what fancy optimisations could those clever brains come up with?
Maybe those transmeta laptops WON'T need 3d accelerator ships?
And it would completely defeat the purpose of a low power laptop to put a big,hot,power sucking 3d chip in it. So I'm assuming that demo of quake3 they showed WAS running in software mode with some pretty fancy dynamic optimisations going on.
Maybe the reason they didn't make a big deal about this is that it's still a "work in progress" as Linus said about mobile linux so they don't want to hype it yet.
Someone prove me wrong?
Quote:
Early in the next century, Dean hopes his new concoction, which he says is "in the idea and invention stage," will be ready for the public: a sleek tablet that is magazine-size, inexpensive, programmable, and voice-activated. He expects his unnamed dream pad, which will run on a 24-hour battery, to provide everything a PC does, including streaming audio and video, word processing, and spreadsheets. It will even have a port for old fogies who can't give up their keyboards. And it will wirelessly put the Internet and other information at your fingertips.
End Quote.
Of course the article never mentions Transmeta, but I bet this web pad would be powered by Crusoe. Here's the link for the article.
The current instruction sets of most processors are probably designed based on certain price:performance ratios taking the cost of producing them as hardware as a major consideration. Transmeta could come up with their own virtual instruction set that would be optimized for thier chips. It would be an easy move for the software developers since their old code could still run on the processor anyway until they recompile to the virtual instruction set. I didn't read the whole Ars article because it's past my bedtime (I'll read it tomorrow at work.) But the author made a comment about framerates "(yet)" -- I didn't see what he was eluding to by the "(yet)" but I got the impression he expects Transmeta compete beyond the mobile arena.
;)
Another thought I've had is that things just got harder for a company like Intel. It was no easy task for AMD to get big enough where they could afford to be competitive with Intel. But Crusoe-type processors sound like they would be much easier to design and produce...new companies will have a much lower barrier for entry into the competition. Lucky for Transmeta that they have their patents
numb
OK, I'm just an applications geek, and know next to nothing about hardware, so this probably sounds pretty stupid. Live with it.
And the brethren went away edified.
I _know_ what you're saying, I _read_ the Transmeta whitepaper & have a pretty good idea of the concepts behind the Code Morpher, I _know_ what how the Transmeta people _want_ the chip to be used, and how a lot of people think it _should_ be used - just as I _know_ that there are going to be some people who will ignore all that & will hack on the VLIW instruction set directly. 99.9% of the people programming for the Transmeta chips won't - but there will be a few that will.
They won't give a damn about backward compatibility, or what the "next" chip is going to implement - they're not programming for money, they're programming for fun, and they'll program using the VLIW instruction set because they'll think they can do it better than the Code Morpher can (for a particular chip, and a particular set of instructions). When they start playing with a new chip, they'll learn the VLIW instruction set for THAT chip and do it all over again.
BTW, regarding some of the replies:
1. "Transmeta's chips transcend backwards compatibility."
Bull.
Transmeta have to create versions of the Code Morpher to be "backwards compatible" with all of the various instruction sets that they choose to support from the other chip companies, plus any "improvements" to the instruction set that those chip companies make. They will have to create a Code Morpher version to run on each new chip that they develop. (Can you say, front-end/back-end?)
If they did a good job architecturally, and make it easy to upgrade the Code Morpher (assumedly in FlashROM or something similar), then given the current processor-types, it shouldn't be too difficult for them to create new front-ends and back-ends.
As time goes on, like any project, the Code Morpher code base will get more complicated & difficult to maintain. They'll make mistakes encoding the instruction sets, and then have to issue updates to correct it, etc.
2. "Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing."
By definition, a "perfect programmer" will always be able to do AT LEAST AS WELL as an optimizing compiler (even at run-time!), because he or she can USE THE SAME TRICKS as the optimizing compiler (write code which collects metrics & recreates itself based on those metrics). And because the programmer has application knowledge which the compiler doesn't, he or she will mostly likely be able to DO BETTER.
Like I said before: for the most part, programmers will use what Transmeta gives them - and for a very small fraction of programmers, in the tiny bits of their code where they want to squeeze out everything they can from the hardware, they're going to try to bang on the metal.
Based on the strong reaction to my reply, I'd say that at least a few people have been programming for a living so long, they've forgotten how much fun it is to "push the envelope" of any given piece of hardware.
I'm sorry, YOU aren't getting it.
No matter how good the Code Morpher is, a talented programmer programming "to the bare metal" will be able to do better. A geek screaming for performance on their "baby" doesn't give a damn about whether the next processor will change its instruction set - he (or she) is interested in getting the max. performance out of the CURRENT processor - which DOESN'T mean you let somebody else's software get in the way.
As far as on-the-fly code tuning is concerned, no matter how good the "tuner" is, it can only react to changes & build code AFTER it has accumulated some metrics, whereas a programmer who is intimately familiar with his or her problem-space, can prebuild tuned code for handling most of their expected cases.
I fully expect dedicated hackers to do what every programming freak does - use the provided tools most of the time, and where they want total control & performance, to write the VLIW directly (no matter WHAT the people who made the chip say).
Frankly, ignoring all the hype, this is just a RISCier RISC chip - what the original RISC folks were aiming for in the first place, but which has fallen by the wayside as they tried to compete with Intel.
There are several reasons why Transmeta doesn't want people coding for the native instruction sets. First of all, coding for a native instruction set will just give us the same problem as we have with x86 now -- too many applications to change the architecture, so crappy architecture ends up hanging around way longer than it should. Second, they stated that the instruction sets for the two chips are incompatible, so obviously there is no single "Transmeta Instruction Set". Third, they like the code morphing because it allows them to make fixes that can be downloaded. If people are coding apps to run natively, this can't be done.
But......
I have been thinking about this too and I'm wondering if it would be possible/logical to define some VLIW Instruction Set that could be used on all Transmeta chips, but would be faster and more efficient than translating x86. The CMS would still be translating from the "Transmeta Instruction Set" to the chip's native instruction set, so they could keep all the benefits as before.
Whadyall think?
pdubroy AT yahoo DOT com
In the article there's this paragraph: Now, let me just stop and say that a number of folks, in their effort to show that they've "seen it all before" and can't be taken in by the hype, have tried to compare Code Morphing to Alpha's FX!32 or to an emulation program like SoftWindows. Such comparisons are like comparing a MinuteMan missile to a bottle rocket. In this case, you should feel free to believe the hype; Code Morphing is cool. I'd have to say that code morphing has been around. One only needs to look at executor from ARDI. It dynamically recompiles 68k code into x86 code using an instruction generator. i think ardi has a whitepaper on this on their site. Besides that, there's not that much difference between FX!32 and code morphing from the software perspective except for the fact that Crusoe had more hardware support of fixups (via the shadow register file and the gated store buffer), FX!32 runs offline instead of dynamically, and the threshold for code generation is much higher (FX!32 translates based on profile info, Crusoe probably only translates when they're enough blocks to make the translatation overhead worthwhile.) In addition, there *has* been work doing dynamic recompilation. That's essentially what a JIT is. Or you can look at a paper in the 1998 ASPLOS proceedings. There's a paper there describing such a system (called Shogun, I think), unfortunately, the target arch didn't have all the crusoe's aforementioned hardware hooks, so the performance isn't quite as high. Even VMware has done this stuff before, well VMware started off as simOS, which did have a dynamic translation as well as interpreted mode. Its just that no one has integrated the translator and added the hardware hooks to make it as efficient.
.oOo.
As an instruction set, the x86 is pretty bad, however it's easy to code for and easy to optimize for, which are it's biggests strengths. As a mid-layer API for this device, it was probably a good choice -- x86 recompiles well on RISCy machines with lots of registers. PowerPC and others probably wouldn't. They have a lot of registers themselves and are more complex (plus the wide variety of x86 clones means that most people will likely shy away from dangerous instructions, whereas since the PPC is VERY standardized, many software packages could rely on subtle bugs in the silicon of the PPC. Believe me, bugs are the hardest part of any hardware archetecture to emulate.)
ARDI has some fairly interesting whitepapers on their implementation of the 68k instruction set on x86. Keep in mind this is MUCH more difficult to do than the reverse. The 68k CPU has 16 registers, whereas x86 only has 8, etc.
The big problem with C is the same C source file compiled with the same compiler can often produce many wildly different results, and C doesn't solve the problem of hardware accesses, which almost always need to be done in a low-level language. This CPU/software will be beneficial to many companies due to the fact they will be able to reuse existing hardware, drivers and software with this. As long as the prices for the CPU get really low eventually, it could really lower the prices for PDA and hand-held computers. (Imagine playing a 3D accelerated game of Quake III on a hand-held machine)
I used up all my sick days, so I'm calling in dead.
can we run a Beowulf cluster with it? =-)
Seriously though. The biggest problems with Beowulfs is space and heat, and imagine low-heat low-space processors wedged in there. Makes me horny.
From the mind of the most famous poster in all of slashdot
...is how much faster this thing will run if it's not emulating an x86.
:-)
That is missing the point, IMHO. One of the reasons the chip kicks ass is because they can change the hardware and you can't tell. Write native VLIW on this pig and you're fucked if they change, just like all the other processors.
... this is coming from a guy who prefers assembly to high-level languages in 98% of cases. I think they really struck on something here, don't fuck it up by asking to write in the "native tongue" of this beast. Well, unless you're writing your own processor.
Okay. The Crusoe is fully x86 compatible. Great. But how about developing applications for this processor that skip the translation step, and are already written in the processor's native language? Think about a Distributed.net client written SPECIFICALLY for this processor, with no x86 instructions.....
I'm betting that would speed up apps tremendously. Even Linux....ported directly to Crusoe's native instruction set. The problem I see is, the processor is designed to run x86 out of the box. Code would have to be written to change the Flash ROMs on the processor to bypass translation and hit the core directly, or at least do a straight-through delivery. (Why translate VLIW to VLIW?)
(IF YOU DO THIS AND FRY YOUR CRUSOE, I'M NOT LIABLE.)
-- Give him Head? Be a Beacon?
-- Give him Head? Be a Beacon? :P)
(If you can't figure out how to E-Mail me, Don't.
Who cares about Transmeta Beowulf's. With the low transistor count and low temp, this chip could do the same SMP-on-a-chip thing that IBM is planning for the PPC. The only reason to have beowulf at all is that it's more economical than SMP sytems, it's not a better solution than massive SMP IF massive SMP can be made cheaply. Of course, some organizations will have a need for beowulf clusters of massively SMP systems...
...damn it, now I'm horny.
--Shoeboy
I have some concerns about the performance that the Cruose processors will actually have. The article mentions that translated instructions will be cached and then be reused if the CodeMorph software sees it again. However, it seems like the CodeMorph's state information will not be mantained between runs. If you power off the computer, the software loses the cached information and has to start from scratch again. In addition, the cache's size or location isn't given. Is it a small cache on die or is it located in system memory? The cache is probably on die for speed reasons but this would limit the size of the cache. This could be a performance hit since the cache is also used as a data cache and instruction cache.
Another question concerns the way the instructions are being cached. For example suppose the following instructions were given
ADD AX, BX
SUB CX, AX
JNZ Cx
Would the translation for each instruction be cached, or is the sequence cached? The article implies that the sequence is cached since the CodeMorph software can optimize the speed on subsequent passes. However, this seems to limit the benefit gained from caching to relatively tight loops or common sequences of code depending on the cache size.
On a side note, the article implies that the CodeMorph software lightyears beyond anything else. However, some of its highly touted features appeared in other software before. For example, DEC's FX!32 would initially just translate code but would also observe the application behaviour and then optimize the code based on that after the application finished executing. It could do this optimization several times, optimizing more aggressively on each pass. Also Apple's 680x0 emulator was also based in rom that would start up initially so what the MacOS could boot. The CodeMorph software has some new features if it really does OO scheduling and optimization on the fly but that seems like a pretty big hit on performance.
If future server/desktop oriented processors implement large parts of the CodeMorph software in hardware, how will that be any different than AMD or Intel's processors since they'll all be implementing a hardware instruction translation unit besides the Transmeta core being VLIW. Plus the transistor count and power consumption will also sky rocket along with that.
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
Most, if not all, semiconductor manufacturers are really cool about this. The companies that were the coolest to me were: Analog Devices, Microchip Technologies, Maxim, National Semiconductor, TI, and Motorola among others.
All of those companies gave me precious device documentation and many of them gave samples as well. I used all of this in school and later in professional life ("we need a good low-power instrumentation amp." "I got a really cool one from AD which has great documentation, let's try it out and we can use them in volume (millions) later" "ok"). Semiconductor companies know the benefits of such behaviour and tend to act accordingly.
Embedded technologies are a very lucrative market that a lot of young people are jumping directly into (myself included). To deny the flow of information on your products would be like tying your own knot. I'm pretty sure Transmeta realises this.
Ask and ye shall receive.
Transmeta does NOT want us programming directly in Crusoe VLIW-native code. In fact, the opcodes will NOT be the same on the 3400/5400 chips, and will probably change for all future chips (each model/variation would need its own code morphing software).
The primary reason is that they don't want to have to make these chips backwards compatible. Intel has a lot of problems with this - even the newest Pentium III's must support programs written for 386s. Intel has a hard time because it can't change these opcodes, but instead has to add new ones - hence MMX, SIMD instructions, the Katmai extensions (the P3 stuff), etc (and similarly, AMD has added 3dnow! et al).
Transmeta wants the freedom to be able to drastically change newer models of the CPU to keep it running at optimal speed/efficiency. If they wanted to allow us to write Crusoe-native code, then they'd need morphing software that allows newer models to morph old code to its own (modified) native code. In other words, a real pain in the rear and definately a problem if Crusoe can't run different "morphers" simultaneously (which I suspect it can't).
As for other morphing software to emulate other processors: I wouldn't be surprised if they allowed it to emulate some other chips - like the PPC, so it can run MacOS stuff - but it won't run nearly as well as x86 emulation will. The chip is meant to be able to morph code from many different platforms, but there are a lot of shortcuts to emphasize x86. I think that topic is addressed in the Ars Technica stuff, but basically Crusoe uses a FPU very similar to the x86 one. I think there are some other things for that in hardware, as well as the fact that we know they're dedicating most of their time to creating the x86 morphing software so it will be the most optimized.
I highly doubt that we'll be able to write our own morphers. I think that it's an extremely difficult thing to do, it would require knowledge of the Crusoe instruction set (which, as I said above, they don't want to release), and the morphing software is probably authenticated somehow. Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.
Linus said that "Mobile Linux" is NOT a code fork - it's just the x86 version with a few modifications to make it run better on embedded platforms. Why reinvent the wheel?
Keep in mind that this is all SPECULATION - if anyone here has other information to the contrary, I'd like to hear it =)
-- Imagine how much more advanced our technology would be if we had eight fingers per hand.
...is how much faster this thing will run if it's not emulating an x86. It looks pretty hot under the hood, and if, instead of using standard guess-aheads, you can tell it which branch to use as default or even tell it about branches ahead of time (which you often know well before the actual conditional looping operation) so it's not guessing at all.
There's of all kinds of fun I could have with this chip...
I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if there isn't room onchip for multiple instruction sets.
They have essentially built a Japanese Compact Car that is fuel efficient, and not an Italian sports car.
Efficiency isn't exactly exciting. Unless I am using a Palm Pilot, I really don't care if my PentiumIII or Alpha is sucking 34W and my Nvidia GeForce is sucking another 30. What I care about is how fast my performance is. How many transactions can I run? How many frames per second am I getting? How many polygons can I push?
Crusoe may be important for the coming ubiquitous computing revolution (if it ever happens), but they are not the first to go after low power (remember Rise? Remember WinChip IDT? Don't forget Strong ARM)
I think Crusoe is a nice chip, but the *HYPE* (and I mean hype) caused by deliberate secrecy and press leaks thoroughly destroyed any chance of it being seen as revolutionary in my eyes.
The Code Morphing technology is not revolutionary. Emulators have been doing dynamic instruction set recompilation for years now, DEC did it with FX32, Sun does it with Java JIT's (including HotSpot which does recompilation based on runtime profiles), SmallTalk VM's have been doing it, hell, even one of the Commodore 64 emulators does it if I recall. John Carmack's Quake3 engine even does it. I'm sure there are hundreds of projects in Academia that have been doing it. The only relevent difference is the hardware assist that the Crusoe has.
Chances are, when you hype something too much, it's going to be disappointing. There's a thread on Usenet that claims Transmeta's *ORIGINAL* goal was not low power, but the best performance, but when they couldn't attain it, they "fell back" to a low power selling point. I think it's in comp.arch.
That's the whole point of Crusoe, you DON'T code for it directly. It takes other instuctions, starting with x86, and runs them faster, better, and optimizes on the fly.
The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed. Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.
Also, the instruction sets are different for each chip. Each set is further optimized for what it's use is going to be. So if you code for one Crusoe chip natively , it doesn't run on the other. This lets Transmeta change the instruction set as needed to. Like if it's faster to do something one way, they can change it and not break compatability with anything. And they can give you the update with a software patch.
So, it doesn't matter if people don't have the instruction set for the native Crusoe processors. They will change alot, and everytime they change you would have to recode every program again. Why bother? Also you don't get to use what the Crusoe processor is all about, it's code morphing layer.
So, PLEASE, stop complaining that you can't code natively for this chip. The code won't go any faster, and as soon as Transmeta changes the set, your programs wouldn't run anyways. So it's a moot point to code navitly for it.