IBM's OSS Code Morphing Code/or OSS vs. Transmeta
jjr writes: "It seems that IBM has a Open Source Project called Daisy that does a lot of what transmeta does. Their code-morphing technology supports PowerPC, x86, and S/390, as well as the Java Virtual Machine. They Morph the [code] into VLWI just like transmeta but they still have some issues to work out. Other issues dealt with in the report include self-modifying code, precise exceptions, and aggressive reordering of memory references in the presence of strong MP consistency and memory mapped I/O."
On the other hand... imagine running Daisy atop Windows, sitting inside a VMWare instance on Linux on Daisy on an S/390!
If i'm not completly wrong you can't run Linux on Daisy on a s/390. You could run Linux programms if you like but not install the hole operating system.
But a funny idea :)
--
There is no such thing as gravity. The Earth just sucks.
If so, and if it's in the way of IBM, IBM might have to challenge those patents; nice.
They (IBM) might be successful since they're such a good customer of the patent office. Surely they will get some favourable treatment to overthrow those patents.
Btw in this case I'd be on IBM's side, since here it is in-the-open research against patents. Aside of what one thinks about IP patents, research should never be limited by futile things like patents. But it would be ironical since IBM, in other issues such a big user (profiter) of the patent system, would kind of have to act against itself (i.e. the patent office) then.
Not really. Amiga's just implementing a thin virtual machine layer, providing an "ideal assembly language" that provides more control than, say, C, but still provides sufficent abstraction that the code can be targeted to a wide range of CPUs easily. (This is in stark contrast to, say, the Java VM, which is comparitively quite heavy.) You can think of Amiga's virtual assembly as a "medium level language", if such a term exists.
DAISY translates other ISA into its own native Tree-VLIW ISA. Rather than providing an abstract assembly language that gets targeted to a wide variety of CPUs, DAISY is doing the reverse: Take a wide variety of ISAs, and target them to this specialized CPU. Transmeta is similar, although they've chosen to focus primarily on x86 to get the biggest bang for their limited bucks.
--Joe--
Program Intellivision!
Program Intellivision!
BTW, Transmeta has been working on their stuff since 1995, so the technology mentioned in the 1997 paper doesn't strictly predate it.
I read about Daisy a few years back when I was studying VLIW scheduling techniques and whatnot. The DAISY VLIW is quite different than most VLIWs around. Their instruction word is built upon the ability to execute large numbers of "branches" in parallel every cycle. (As best as I can tell, these "branches" are actually closer to being composite predication conditions in many cases, which is why I put "branches" in quotes.) Their experimental physical implementation could execute something like 8 branches every cycle. Downright weird.
A more traditional VLIW uses predication to convert short branches into a simple "if (cond)" prefix on individual instructions. (This technique is known as if conversion.) Also, traditional VLIW instruction words are flat -- all N instructions in a VLIW bundle execute together in parallel, with no tree structure implicit in the encoding.
All that aside, the DAISY scheduling techniques sound pretty similar to trace scheduling , which was used on the old Multiflow VLIW machines. The actual process of converting PowerPC instructions to individual DAISY operations is mostly search and replace, and preserving program order is a matter of constructing proper dependences between the instructions.
Feel free to ask me questions if you're curious about this kind of stuff. It's my day job.
--Joe--
Program Intellivision!
Program Intellivision!
Personally I think it's a shame that while we all wait for these technologies to get economically viable the suburbs of the US, Canada and Australia are being filled with fuel-guzzling gasoline-powered four wheel drives, despite the fact their owners never take them off road :(
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
A child might fall down a well, and try to survive on the smelly water in daddy's cell phone battery.
This is a sign of how screwed up the US is that this incredibly remote possibility should even have to be taken into account.
So if you had a Sparc -> Sparc binary translator you could make the thing run faster.
:)
On the Amiga, with Motorola's 68000, 68020, 68030, 68040 and a few 68060, someone actually released a binary patcher that attempted to patch binaries compiled for lower processors to make them faster (use new instructions, avoid ones emulated on the newer chips and thus slower, etc.) It also attempted to patch some sub-optimal cases often produced by the main C compilers in the market.
Tended to work pretty well...
OK, completely irrelevant, but I thought you might be interested anyway
deus does not exist but if he does
I'm talking about a program which takes a binary compiled for one processor as its input, and gives a binary native to another processor as its output (and then runs it). This way, you only translate once, rather than each time thru the loop.
MS can't even leave DOS behind, never mind ia32.
AFAIK- the 68k emulator that was used on the initial powerpc macs was infact emulating a 68LC040 (which *doesn't* have a fpu... that's why when the powermacs first came out there were 2 or 3 commercail apps that emulated the 68040's fpu and basically gave you a full 68040. i even think there was a shareware one that ran fine on a real 68LC040, but in order to run on a powerpc it needed registration. as for that just in time compile stuff, damn i never knew that... learn something new every day :)
The first emulator I understand was basically an interpreter, sort of like the Java virtual machine but where the "bytecodes" are 68000 instructions (I'm not sure which actual microprocessor was emulated, maybe it was the '020). Not real fast because you have to decode each instruction every time you hit it, but it was well-written and reliable.
Then there was the dynamic recompilation emulator which I believe first appeared in the first PCI Macs (like the 8500/120) and System 7.5.3 (not exactly sure if that's right but thereabouts).
This was like the JIT - "Just in Time" compilers for Java, it would compile 68000 code to PowerPC code and then execute the PowerPC code natively.
This was a shipping product I believe in late '95 and I'm pretty sure Apple was not the first to do such a thing.
Note that on the Mac they were unable to rewrite much of the low-level OS code from 68000 to PowerPC, at least not initially, and so a lot of system software remained emulated and probably still does. Also it is very common for Mac applications to install interrupt time tasks and many of those are legacy 68k apps and it would be innefficient to switch instruction set architectures all the time.
I seem to recall it takes something like 200 PPC instructions to switch from one architecture to the other so if you're already in 68k code and you're about to run a small routine it's best to remain emulated.
It is possible to write "fat" code that provides both options and the machine will use whichever one it's currently running - this is common for "Extensions" which make "fat patches" to OS calls, and many OS calls are "fat traps".
For this reason, the Classic System 7 MacOS (of which Mac OS 8 and OS 9 are examples, but Mac OS X is a whole different thing) handles hardware interrupts in emulated 68000 code.
Interrupt handlers and device drivers may be written in 68k code or PowerPC code as you like and run on a PowerPC machine.
The dynamic recompilation emulator I think emulates an '040 with its instruction cache issues, and it correctly handles hardware interrupts that happen in the middle of running a chunk of recompiled code.
Early Mac apps very commonly used self-modifying code. For example, if a "code resource" was expected to be loaded into memory and used by the system, many applications would load a small stub that jumped to an offset that was a placeholder. Then they would write an address in the running program code into the placeholder after it was loaded. This kind of thing screwed up on the 040 because you were writing to code using data instructions, but there were lots of workarounds such as the painful decision to flush the data cache after calling BlockMove - and the addition of the BlockMoveData call which wouldn't flush the cache.
Also note that an application (or any code) can install callbacks that are written in 68k, PPC code or fat, and this code will be correctly called from the OS or toolbox, whether it started in 68k or PPC. This works because of something called a "routine descriptor" that is a compact description of a function API - it handles Pascal vs. C calling conventions, instruction set architectures, and the possibility of providing alternative entry points for each architecture.
On 68k there is a "trap" - a defined illegal instruction, that causes a jump to an exception handler. The exception handler reads in the routine descriptor and does the right thing. On PPC, you pass the routine descriptor to the CallRoutineDescriptor function (or something like that).
68k code is legacy and knows nothing about routine descriptors, but the emulated processor handles traps correctly. PowerPC being released after the routine descriptor architecture was all implemented, developers can easily put it directly in their code. There are headers with macros that make most of this transparent so you can compile both kinds from one set of sources.
Michael D. Crawford
GoingWare Inc
-- Could you use my software consulting serv
Dude, what the hell is that?
It looks like BASIC and C on crack!
Oh wait...
Is that Python?
It's kind of cool; they actualy sample executing code (including kernel) at regular intervals by interpreting some instructions from the instruction stream instead of just recording the instruction pointer. This enables them to gather statistics about the outcome of instructions, physical location of load/store instructions, whether the instruction hit in the cache, how long it took to execute the instruction, and so on.
There is supposedly a downloadable evaluation version of the software at their website (problem of course, is that it only works on alphas running Tru64 Unix or Windows NT).
My CS BSc Third year project is to do this for x86, so if i am lucky, you may have a way to make your linux box faster in a year or two :-).
My biggest problem has been with the hideous instruction set x86 provides.
There is no apps available for this configuration either. I think there was Notepad and Solitare... and that was about it.
EverCode
True for LCD, but why limit yourself to one technology?. There's no reason a screen has to emit light at all. After looking at several flavors of "electronic paper" it doesn't seem particularly fanciful to imagine a display which consumes zero power if the image isn't changing and which is readable under the same wide variety of conditions as regular paper. It may well be that such displays will always lag behind more conventional technologies in areas such as transition time or color depth, but for a very wide variety of devices and applications that would still be a big win.
Even within the realm of light-emitting display technology, there's plenty of room to reduce power consumption. For example, the Light Emitting Polymer work at CDT could lead to displays that consume a lot less power than CRT or LCD displays, in addition to being extremely thin, light and flexible.
I'm not trying to argue with you here. I completely agree with your main point that power consumption needs to be addressed beyond the CPU. Displays and rotating media in particular are at least as deserving of attention. This is all just FYI.
Slashdot - News for Herds. Stuff that Splatters.
Does anyone still make monochrome laptops? I'd like the additional battery life that might give.
-------
CAIMLAS
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Congradulations dieman!!! Your post has now become the technical bulk of another DAISY/Crusoe article from ZDNet.
/. I commend them for knowing where to get quality information; but, considering that they quoted dieman word-for-word, I wonder if they even read the white paper themselves. I'm betting that they couldn't understand it and decided that obviously dieman could - just look at all those big words - so he seemed like a good source. I haven't gotten around to reading the white papre yet myself, so I hope you got all of it right dieman.
Appearantly, no one at IBM or Transmeta would call ZDNet back, so the writers came to
One other thing, the ZDNet article mentioned something about Crusoe doing parallel processing, and I believe they mean internally, not just multiple processors on a board. I haven't seen anything anywhere indicating Crusoe or DAISY is capable of true parallel processing. Has anyone seen anything about this, or are the ZDNet writes drinking their glow-stick juice again?
I'm out of my mind, but feel free to leave a message.
yawn.. use bochs.. it's emulation, it's slow, but it's GPL.
How we know is more important than what we know.
ST: Anyone else immediately think of 2001 when they saw the project was named Daisy?
Marxism is the opiate of dumbasses
no I think that more likely had something to do with the amount of vapour in the air and the free falling NASDAQ.
How we know is more important than what we know.
IMO, the best approach would be a hybrid, where the code morphing could use the intermediate representation as a binary form to generate machine language, and then optimize it using runtime profiling and scheduling based on techniques already used by current compilers on IR trees.
How is this different from a really good Java JIT compiler?
"in an html comment at transmeta.com, so they even own a trademark on the words "code morphing", bad bad ibm"
Read the page. IBM doesn't use the term.
Flywheel energy storage looks particuarly promising, even for powering something as small as a laptop.
Isn't it interesting that IBM dropped their line-up of Cruso laptops a few months ago? Could IBM possibly be working on their own non-natively x86 processor? It's a great time to be alive!
-C
"This above all, to thine own self be true"
Hotspot JIT's. Sun pretty much invented the dynamic compilation world.
How we know is more important than what we know.
It occurred to me that profiling a kernel like I suggested is a problem because the kernel can disable interrupts (as when handling an interrupt) and so even though you might be able to sample to some extent it may be hard to get good results. Also you crash the machine, etc.
But I recall reading recently here that someone had the Linux kernel running as a user space program. So you boot a real linux kernel, then run a fake kernel inside of some kind of hardware emulator or something. It was suggested to use this for kernel development - you could quit the kernel and restart it much quicker than rebooting and there's less danger of corrupting your machine, if your test machine is also your user machine, as is all too often the case.
But with this you could easily profile a userspace kernel and be interrupting it from the outside without the test kernel being aware its being interrupted, as those interrupts are not handled by the test kernel, but external code.
Of course, you'd want this to work for ordinary programs first. Let the kernel be your fourth year project!
Michael D. Crawford
GoingWare Inc
-- Could you use my software consulting serv
Actually, you're wrong.
Linux, the whole operating system, installs and runs on s/390.
If Daisy runs under Linux, then you could conceivably have an S/390 (and can we be friends if you do?), run linux, run daisy, run windows, run daisy, run macOSx, etc....
A host is a host from coast to coast, but no one uses a host that's close
> cool patent
-- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz
Many emulators (e.g. Apple's 68k emulator, VirtualPC, HP's Dynamo, probably others) use very similar techniques to DAISY. Dynamic recompilation, dynamic translation, code morphing: the names change, but it's not a new idea.
I see that you also got suckered by the story; DAISY actually does not support x86 and it produces VLIW code, not VLWI (whatever that is).
You might want to look at the GNU Rope (a.k.a. grope) project. I'm not sure if it would help or not, but it sounds similar.
I don't know if anybody is working on it currently, but here's an article about it:
http://lwn.net/1998/1029/als/rope.html
-- Wodin
A color LCD of usable brightness (another huge drain on battery life) is going to output a certain amount of energy
i agree, but i for one (and i'm sure there are others out there) would be happy to get a greyscale screen if i could get an increase in battery power for it. are there any decent laptops out there with black and white screens?
--saint----
It occurs to me that there's a third possible way: rather than doing the emulation step by step as the program runs, step thru the whole compiled program and convert it to native code just once, and then run it natively from then on, rather than re-emulate it each time thru the loop.
How come nobody is doing it that way?
FYI, one was SoftFPU, a control panel. Was dog-slow, but the only way to get some apps running on Performas and LC4xx's.
Comment removed based on user account deletion
Didn't transmeta patented anything off this Code-Morphing tech?
So, does it mean that with this technology I can run Microsoft Windows ME and Microsoft Windows NT 4.0 and Microsoft Windows 2000 Server on PowerPC (including MacIntosh PPC) and IBM S/390?
No, I'm not trolling. I'm just curious since there has been this sort of "hardware emulation" trend going on recently.
-- Microsoft, Inc. http://www.microsoft.com
Code morphing is a great way to transition to VLIW, but dynamic translation and parallelization will always be slower than native processes. Are there any other ways that we as a community can start moving away from the old x86? I am sick of only having 4 registers when asembly programming!
why doesn't the industry start to concentrate on making energy efficient devices besides the processor, and it would also help out so that we aren't pushing battery technology, because that field seems to be lagging behind badly
Lysergic Acid Diethylamide, not just chemistry, reality!
What make Transmeta special is that they have put a dynamic binary translator in a chip and have developed silicon to make it faster.
No, actually you have it backwards. Intel (and later NextGen, AMD and I believe Cyrix) put a dynamic ISA translator *on their chips* starting with the P6--they decode (i.e. translate) x86 instructions into internal "u-op" instructions (AMD calls them "macro-ops", same idea) which are used by the rest of the silicon. (This is necessary because x86 instructions are too heterogenous in length and complexity to work well in a deeply-pipelined out-of-order core.)
What Transmeta did was essentially move this translator *off* the chip, into software. The advantage of this is simpler silicon, and therefore lower power consumption. (Also, all things being equal, higher maximum clock speeds; all things are clearly not equal.) A secondary advantage is that far more resources (16MB IIRC) can be devoted to buffering, tracing, analyzing and optimizing the instructions than on a chip, where the physical chip-size keeps buffers small and optimizations simple. The disadvantage is that all this needs to be run on general-purpose (i.e. slower) silicon--and worse, competes for CPU-time with the very programs it is trying to optimize. (Not to mention takes up 16MB of system resources.)
So far the tradeoff has been (IMO) a big loser except in special circumstances--where you need long battery life, x86-compatibility (otherwise there are faster, smaller, more efficient chips out there, like anything in the ARM family), little weight (otherwise just use a bigger battery), and have efficient enough components for the rest of the system to actually make a difference (this is the gotcha with traditional laptops). Whether this particular set of circumstances will turn out to be a small or huge market niche, it is certainly a small problem space. Of course, much of the blame is due to TM's implementation rather than the (basically sound) idea; apparently their architecture is not up to Intel's standards (their process technology is IBM, so that's not the problem). Of course, mistakes are very common in the first iteration of a wildly new idea--witness Itanium (harnessing VLIW for very different ends--and arguably with less success) for proof of that.
You will still have a need for OS specific apps, but so much of the customer cost in OS replacement is in replacing the apps for the OS.
Different emulator codes could be optimised for different classes of program: for example, games and productivity suites have different requirements, and thus could use different emulators. You could can the games emulator to stop people running games at work :)
There is so many possibilities that we missed because of the `ILOVEYOU' affair with Windows :(.
OS/2 - because choice is a terrible thing to waste.
This sounds similar to what Amiga is doing.
Do your best, hope for the best, suspect the worst.
This was in either late '95 or early '96 - but the IBM work on this had been around for a while by the time I read the paper.
This technology is widely available now - read all the way to the end to see how you can try it out.
If you have a jump to a certain offset in a routine, you can move the code where you jump to elsewhere in the file and change the offset you give in the jump. Complicated, because you need to parse RISC machine code, but doable.
It's made a little easier by PowerPC instructions always being fixed at 32 bits with no extension words (a side effect of that is that there's no way to load a 32-bit constant into a register with a single instruction, which makes it hard to scan machine code by eye for constants in an assembly debugger.)
This has the effect of speeding up the overall program execution because you group frequently used code blocks together in the executable file, and also in memory once it's loaded. You may find less-commonly used branches of an if-statement put miles away at the end of the file, so that you jump a long ways away and then back in sometimes, but this isn't a big deal because all the frequent cases flow straight along.
The reason this is a big win is twofold. First, you reduce virtual memory paging and the code resident in physical memory because less commonly used code is all grouped together and just sits idly paged out on disk; that which is taking up valuable physical RAM is of a minimum size and being used actively.
Also (and more importantly in small programs, and in CPU-bound cases), you make more effective use of your processor's code cache.
This is because jumping over an uncommonly used branch may load a few unused instructions into the cache at the beginning and end of the branch that's not taken - cache lines (blocks) are of a fixed size and are always aligned by the cache block size, so if you have 32 byte cache lines then the start of any cached code falls at a physical address that is divisible by 32.
If you run even one instruction into the address rangle, you load 32 whole bytes of code into the cache, deleting 32 bytes of code that might be useful later, then if your code is not optimized this way you'll just end up jumping over most of it.
Many people who are trying to make their programs run faster would benefit from knowing more about how the cache works. Gary Kacmarcik's Optimizing PowerPC Code has a good discussion of this that will benefit anyone who programs on modern microprocessors - not just PowerPCs. And while Kacmarcik emphasizes PowerPC assembly, most of the benefit of improving cache use you can do from C, C++ or another higher level language.
The way the profiler works is that an interrupt-driven task is used to check the instruction counter at frequent but random intervals. The samples are saved to a file for later analysis, then a postprocessor makes a histogram which gives the number of samples per basic block of instructions.
(A basic block, essentially, is any code that falls between a pair of curly braces if it came from original C source code. It's more complicated than that in practice but basically it's a chunk of machine code that has one entry point and one exit. It's possible to analyize machine code with a program and divvy it up into basic blocks.)
Then basically what you do is sort the machine code, with the most frequently used basic blocks coming earlier in the file.
Note that the profiling process depends necessarily on the use to which the program is put during the sampling. For best results, you might actually want to prepare several seperate binaries of the same program, each optimized for a different purpose. Or you might want to construct test data or a test script that gives you a good overall average performance.
Now, how do you get this tool? It's more than just theory. It's available for IBM RS-6000's, although I don't remember what they call it.
But if you can spare the cash for an iMac you can get it included with the Macintosh Programmer's Workshop - MPW. The particular tool that's used for this is called MrPlus, which is discussed in Apple's Technote 1174 and Technote 1066
I believe a variant of this is available in the Metrowerks Codewarrior development environment for PowerPC (CodeWarrior also supports Windows, Linux via GCC and lots of embedded systems but I believe the code reordering is only available for PowerPC).
CodeWarrior provides both an IDE (on Windows there's a choice of MDI user interface or Mac style with a global menu bar and free windows, which makes me much happier when I program on Windows) and it also provides command line tools, including the entirety of MPW with mwcc preinstalled so you can do "make" style builds on the MacOS (but with a weird makefile syntax). I don't seem to find any mention of this on Metrowerks' website. I'll ask their friendly support guy if I'm correct about this.
Perhaps you're lusting over using this for Linux. It would certainly be interesting to try using this on the kernel - build the kernel, boot the machine off it, run it for a while under a normal load while you run the instruction pointer sampler, then reorder the instructions in the kernel and boot off the new kernel and you run faster!
This would probably be easiest to do on PowerPC Linux given the availability of published information from IBM and Apple about it, but I don't see why you couldn't do it for any instruction set. Some would just be harder to parse or rearrange correctly than others.
Stop drooling and start studying.
Michael D. Crawford
GoingWare Inc
-- Could you use my software consulting serv
yawn.. a translator of x86 to a RISC like instruction set by the processor is all well and good it does indeed fit into the definition of a dynamic binary translator. Good point.
How we know is more important than what we know.
Hotspot works by JITting only parts of the code - the parts that get run a lot (hotspots) - and tuning & optimizing these for speed. For the rest of the code, it's an interpreter. Because of this, startup time is low (no need to JIT the entire Swing toolkit, XML parser, CORBA, etc. etc. when you start up a complex app that uses all of these) and execution speed is typically at least as good as for a JIT.
Don't you wish you could talk to those managing transmeta directly? I'd love to point at articles like this and say, "I told you so." They are a good year ahead of any competition, but unfortunately their products are still too pricey and too slow. Since Transmeta refuses to open source their code morphing capability I'll put my money and support behind IBM or whomever writes software to give me the functionality Transmeta doesn't even want to give its customers.
I want a system that can change its instruction set on the fly, or at least in prom or bios. I want a system I can run solaris, OSX, Linux, IRIX and wintendoze on natively at near hardware speeds. It would also be nice if this could be a portable system, but that's not nearly as much a requirement. Transmeta refuses to write additional code morphing software for the ultrasparc, MIPS, PPC, etc. instruction sets. So as far as I'm concerned they can be consumed by AOL or the next big monopoly. I won't shed a tear.
Digital (Compaq) developed an x86 Dynamic Binary Translator running on Alpha called FX!32. FX!32 won Byte Magazine's "Best Technology" award at Fall Comdex '95.
Dynamic in this case means that some code is emulated on the fly, and some is translated. This approach was pioneered for bytecode systems in Smalltalk implementations in the 80's, and of course is now used in Sun's HotSpot and other dynamic adaptive JVMs.
Static binary translators have been around for even longer, and were used (among other things) for running VAX programs on Alpha. A useful overview of this sort of technology appeared in the Digital Technical Journal 4:4 (1992). HP also performed binary translation between the HP3000 and the Precision architecture, but I can't find on-line info on that, just a citation to a paper article (1987). There is also a useful survey article on static and dynamic binary translation.
What is presumably novel in Transmeta's approach is that their instruction set architecture (ISA) is tuned specifically for dynamic translation (see page 12ff of Transmeta's paper The Technology Behind Crusoe Processors . Some microcode architectures have been designed specifically for general emulation (most have been tuned for a particular macroinstruction ISA), e.g. the early Lisp Machines (1976-81).
Yeah that was the shareware version. i did a little digging and the commercial app i was thinking of is/was called PowerFPU.
This is pretty neat how it converts PPC and x86 code into VLWI. It is a good way to see how efficient VLWI's unique tree instruction approach would be in currently compiled code. However, there is a bit of a latency the first time each block of code runs, so it is difficult to tell how much this will slow down the process the first time through each brick of code.
This could mean that upgrading architechtures could be possible while still retaining backwards compatibility. Isn't it about time Microsoft left the x86 instruction world and embraced the newest technology available? This would be like Apple's transition to PPC, although unlike Apple, they wouldn't need to write a software emulator for older software, they could simply use DAISY to morph the code.
Does anyone know how DAISY compares with software emulation in terms of speed? I'm guessing it is a great deal faster.
yes.. that is because Daisy is a DYNAMIC BINARY TRANSLATOR.. say the words with me. What make Transmeta special is that they have put a dynamic binary translator in a chip and have developed silicon to make it faster. At this very moment I am doing maintenance work on a Pentium -> Sparc dynamic binary translator. Getting x86 float point instructions to work is a bitch, but for some reason the compress95 benchmark needs float point to generate data in the test harness, even though it's an integer benchmark.
How we know is more important than what we know.
How similar is DAISY to Transmeta?
According to their white paper, Transmeta uses dynamic binary translation to convert x86 code into code for Transmeta's internal architecture. This is similar in concept to the current version of DAISY which converts PowerPC code into code for an underlying DAISY VLIW machine. DAISY was developed at IBM independently of Transmeta. The DAISY research project focuses less on low power and more on achieving instruction level parallelism in a server environment and on convergence of different architectures on a common microprocessor core. A more detailed comparison of the DAISY and Transmeta approaches will be possible after Transmeta publishes their techniques in more detail.
-- dieman - Scott Dier
-------
CAIMLAS
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Does this mean that (if all goes well) the technology could be used to blend the stability of Linux with the availability of MS apps? With Big Blue's half hearted attempts at putting out Linux desktops/laptops, this might be a sign of good things coming down the road?
Are YOU listed?
RE: prior art, please read Transmeta's patents before flying off the handle here. To the best of my knowledge, TM didn't patent dynamic translation, but they have several patents on optimizations for dynamic translation. Most of them are particularly suited to the situation where the hardware can be specifically designed to help out the translator, like the shadow registers used to insure precice exception behavior.
A nice link to a readable and somewhat technical overview of fuel cells.e s/pems/pems.html
http://www.memagazine.org/contents/current/featur
A nice Scientific American article.o ns.html
http://www.sciam.com/explorations/122396explorati
Two nice links to NEC's proton polymer battery.
Asian Biz Tech article.
EE Times article (short and sweet).
I'm still waiting for the car that runs on happy thoughts and chocolate that John Stewart promised me.
--Jimmy has fancy plans; and pants to match.
A lot of the speed is lost in the parsing of the source. If the source were tokenised, like the old basic stuff was, then it would probably run a lot faster.
OS/2 - because choice is a terrible thing to waste.
ank.. the expert I have sitting in the room says that Hotspot JIT's everything and then when it finds a hot path it optimises. Seeing he works for Sun Labs and all, I think I'll take his word for it.
How we know is more important than what we know.
that's cool! There was a program that did the same thing on Sparc, just looked for the calls to .mul and .div, etc and replaced them with single instructions. Then it evolved into doing better register allocation and soon became a static optimiser. Someone turned it into a dynamic optimiser. This is all research stuff, Sun doesn't actually sell this program (or give it away).
How we know is more important than what we know.
Reading the related paper about the use of code rearranging (let's not call it code-morphing lest we get a patent infringement notice from TransMeta) for Java optimisation shows some of it being used starting in 1997 (isn't this pre-Transmeta ? prior Art ?). The PDF doco covers the idea about converting Java bytcode into RISC (PowerPC , it is IBM =) ) code that is then scheduled in a magic way to give a degree of parallisation on the right hardware. Hmm this does smell like Transmeta. One of the guys working on that Java Paper has got a few patents in his name for optimization ..
--
Jon - TheSpork
What about fuel cells for power? I vaguely remebe reading somewhere that they can theoretically make these small enough to fit into mobile phones and palmtops etc. Think of topping up your cellphone with a thimbleful of alcohol and running it for two months.
Never trust a man in a blue trench coat, Never drive a car when you're dead
From what I understand of it, this is essentially what Java does: compile code for a virtual machine, and then emulate it on different OSes and processors. Of course, there are some obvious limitations to Java, most notably the extra memory it uses and the decreased speed of Java applications, but that's the sort of thing that could be expected in any project like this.
Actually, thinking about the speed limitations of Java... do any of the JIT runtimes optimize code when they translate it? If it works as well as it's supposed to for Transmeta, I'd think the same principle could be applied in Java. Anyone out there know if it's being done, or why it wouldn't work?
--Moss
--Moss
This is a
Now there are two of them.
There are two _____.
actually we've heard some horror stories about Sun's Sparc machines. Because they have so many different versions of the processor, people often distribute just one binary - the lowest common denominator. This makes customer support a hell of a lot easier when someone says "it crashed at 0x..." and they don't have to go about asking the person what version of the binary they are running, etc etc. Apparently people do this a LOT and your new shiny V9 is no faster than your neighbours V7 (can't even multiplication in a single instruction!). So if you had a Sparc -> Sparc binary translator you could make the thing run faster.
How we know is more important than what we know.