Next Generation Chip Research
Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."
It doesn't actually look any different. 128 instruction per "block" executed in parallel, just like a superscalar processor. This has been around since the time of the Pentiums (The pentiums weren't VLIW, though). What exactly is new?
There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
I seem to remember that Intel designed Merced (now the Itanium, known colloquially as the Itanic to reflect how well it's gone in the marketplace) to shift the burden of branch prediction and parallelism to the compiler. Or, in other words, the compiler was expected to mark instructions that were capable of running in parallel, and also to state which branches were likely to be taken.
All a great idea in theory; after all, the compiler should be able to figure out a fair amount of this information just by looking at the flow of data through the instructions (although it may not be so good at branch prediction; I'm not sufficiently strong on compiler theory and branch prediction to talk about that.) However, as can be seen by Itanium's (lack of) market success, the compiler technology just isn't there (or maybe we're using the wrong languages; there are, after all, languages that are designed to be inherently parallel.)
If this team can get it working the way they want to, maybe -- just maybe -- Itanium will find its niche after all. But let's not kid ourselves; this is a hard problem, and it's more likely that they'll make incremental improvements to the knowledge that's out there, rather than a major breakthrough.
> their code for parallel processing, and that's difficult or impossible for some applications.
>
> "The industry is running into a programmability wall, passing the buck to software and hoping the programmer
> will be able to write codes for their systems," he says.
So you want the programmer to be unaware of the parallel processing. Then the article goes off and says something stupid IMHO.
> a huge amount of control logic, control transistors that don't do any work -- they just consume power. Trips is trying to push some of that complexity back up into the compilerI thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions. As a hobbyist compiler dev, I'd like to note that the data flow architecture is the basis of all compiler optimizers (DAG), though the typical compiler dev is likely to use this input to allocate registers to minimize pipeline stalls. I admit that it can be done at the CPU level to some extent - then this is even stranger.
> Trips compiler sends executable code to the hardware in blocks of up to 128 instructions. The processor "sees" and executes a block all at once, as if it were a single instruction, greatly decreasing the overhead associatedSomehow this just shifts the hard work of peephole optimisation to the CPU to be done at real time. It would have been far better to do it in the compiler properly - something which needs extra memory and lots more processing than the code that is being executed.
All in all, I don't see this thing revolutionizing General purpose programming systems. Though what I call special purpose programming might be the way the future of programming might go - I'm no Gordon Moore.Quidquid latine dictum sit, altum videtur
This looks to me to be a combination of old and not so good idéas.
I have read about out of order execution and using data when ready at least 5 years ago in Hennesy and Pattersons book "Computer Architecture A Quantitative Approach". To me it sounds like a typical scoreboarding architecture.
And how he can claim that this will lead to less control logic someone else might be able to explain to me.
As for executing two instruction at once since their destination and value are the same sounds like a operation that will lead to more control logic. Besides doesnt most compilers optimize away these kinds of cases?
"This message was brought to you by Sarcasm and Troll Feeders United (or STFU, for you un-hip people)."
Having a routed network on which data can travel between function units without being merely copies of data assigned to a register file is not very mainstream (unlike say register bypass). It's not really a new idea, getting it to work well would be though.
Why wouldn't they have CS programs in Texas?
What, you think all they teach at Texas univiersities is agriculture and oil-related subjects?
Don't judge Texas until you've spent some time there. I hate the place, but I'm from Oklahoma where hating Texas is a requirement of citizenship.
Those who can't do, teach. Those who can't teach either, do tech support.
I had an interesting discussion with a chip designer the other day. We were talking about parallel processing, and I spouted the usual perceived wisdom "But isn't the problem with parallel processing that many problems are very difficult or impossible to do in parallel? And isn't programming in parallel really difficult?"
I found his answer very interesting, something like "that line of thinking comes from when computers weren't fast enough to do the basic things we wanted to do with them to do then. It's true, an application like a word processor is not a good problem to tackle with parallel processing - but we don't need to these days. Nearly all the stuff we want to do today - faster graphics, 3D video image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going - all of these problems are idea for parallel processing. What Google does - that's essentially parallel processing, isn't it?"
That kind of changed my perception of things and made me realise my mindset was way out of date.
Actually, it sounds more like an FPGA. And, since VHDL is turing-equivalent, it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.