Next Generation Chip Research
Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."
apprently, one of the pressing problems that chip designers are facing is coming up with stupid, meaningless acronyms.
It doesn't actually look any different. 128 instruction per "block" executed in parallel, just like a superscalar processor. This has been around since the time of the Pentiums (The pentiums weren't VLIW, though). What exactly is new?
There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
Branches can be predicted with fairly high accuracy. And most new architectures have some form of speculation in the core. And they actually execute 16 instructions at once. Only their word is 128 instructions long.
There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
We can understand easily how a loop could be calculated as a function, if the contents of the loop block is composed solely of calculations. When this occurs, the output of the loop is simply a function of its input (f(x), if you will). However, computer scientists who think that programs can always be reduced to a simple function with given inputs have their heads too far in their books to see how the real world forces programs to be far removed from that ivory tower gobbledygook.
In the real world, you aren't typically performing calculations in loops. Rather, you are usually reading and writing to memory, which may or may not be cached. So it isn't just a matter of saying f(x), it is much more complicated and possibly dependent on memory which you have no way to determine until the loop iteration reaches that point. And then you'll still get the bottlenecks which plague us today. Memory isn't fast enough, devices aren't fast enough, too much time is spent waiting for I/O to complete.
Pushing as much brute-force computation off onto compilers is fine. Let them unroll loops and optimize functions. But what are the limits to this? Can we really optimize our way to 1-step loops? I don't think so, but the DOD seems to think it is possible.
Jesus saved me from my past. He can save you as well.
Bugs on the chip can lead to bad Trips
So Long and Thanks for All the Fish!
I alluded to this in my earlier post. Some mathematical operations are simply loops over a seed input. A summation is one example. You can reduce the calculation of a summation from a long series (infinite, perhaps) of functions executed in a loop to a single function which is valid for all inputs (voila, Calculus).
So they say they can take loops in 128 blocks at a time and calculate the result in less than 128 loop steps. They are requiring the compiler to come up with a valid function for those 128 steps that will work for any initial parameters. If it works, it means that you are no longer executing 128 time, but only once. That is a speed-up of just over 2 orders of magnitude. Really, really amazing.
But does it work? Can they really ask the compiler to do that much work? Is the compiler capable of being that smart? The main thing I wonder is how well this works, and how optimized it can get when the main purpose of looping is not to calculate functions but to access memory which is itself not fast.
Jesus saved me from my past. He can save you as well.
> their code for parallel processing, and that's difficult or impossible for some applications.
>
> "The industry is running into a programmability wall, passing the buck to software and hoping the programmer
> will be able to write codes for their systems," he says.
So you want the programmer to be unaware of the parallel processing. Then the article goes off and says something stupid IMHO.
> a huge amount of control logic, control transistors that don't do any work -- they just consume power. Trips is trying to push some of that complexity back up into the compilerI thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions. As a hobbyist compiler dev, I'd like to note that the data flow architecture is the basis of all compiler optimizers (DAG), though the typical compiler dev is likely to use this input to allocate registers to minimize pipeline stalls. I admit that it can be done at the CPU level to some extent - then this is even stranger.
> Trips compiler sends executable code to the hardware in blocks of up to 128 instructions. The processor "sees" and executes a block all at once, as if it were a single instruction, greatly decreasing the overhead associatedSomehow this just shifts the hard work of peephole optimisation to the CPU to be done at real time. It would have been far better to do it in the compiler properly - something which needs extra memory and lots more processing than the code that is being executed.
All in all, I don't see this thing revolutionizing General purpose programming systems. Though what I call special purpose programming might be the way the future of programming might go - I'm no Gordon Moore.Quidquid latine dictum sit, altum videtur
What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple. This is for doing stream data processing such as compression or HPC simulations. I seem to remember in their presentation showing a prototype doing software-radio at a data rate usable for 802.11.
I had an interesting discussion with a chip designer the other day. We were talking about parallel processing, and I spouted the usual perceived wisdom "But isn't the problem with parallel processing that many problems are very difficult or impossible to do in parallel? And isn't programming in parallel really difficult?"
I found his answer very interesting, something like "that line of thinking comes from when computers weren't fast enough to do the basic things we wanted to do with them to do then. It's true, an application like a word processor is not a good problem to tackle with parallel processing - but we don't need to these days. Nearly all the stuff we want to do today - faster graphics, 3D video image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going - all of these problems are idea for parallel processing. What Google does - that's essentially parallel processing, isn't it?"
That kind of changed my perception of things and made me realise my mindset was way out of date.
It seems to me any serious research into microprocessors will be hampered by the fact that it will be completely inapplicable unless it dumbs itself down to ape the x86 instruction set. All current and future processor design advances will be defined as better and faster ways of making modern silicon pretend it's a member of a chip family that was obsolete when the first President Bush was in office. That's not progress. That's just kind of sad.
Heaven help any researcher if implementing their new chip design requires a new software paradigm that doesn't fit neatly into the OS/Application model, too. We're living in the perpetual now of 2000, and it's some boring shit. I want my future back.
Bah.
SoupIsGood Food
This is not some boring super scaler! Nor is it some vector processor!
in fact this is a complete departure from a von Neuman architecture. The architecture is called a Dataflow architecture. In one sentence a dataflow architecture is one where instruction execution is based on the availability of the instructions inputs not a program counter.
The article does a very bad job at conveying the fact that this is a relatively new idea. Like most reporting they report something thats been in research for some time as a huge breakthrough without describing it at all. Instead its really just an incremental step in dataflow computing research.
I work in a lab at the University of Washington on another dataflow architecture. Its a really interesting idea but it will take some time to develop and you're not going to get one on your desk for some years to come.
I recommend you read this paper. It gives a great overall picture of what TRIPS is all about and is actually really cool. (I read it about a year ago).
I am an ECE grad student at UT Austin so I know quite well of TRIPS. In fact I often speak with Doug Burger himself because he's the faculty advisor for the UT Marathon team, of which I am a member. (By the way, his name is "Burger" not "Berger"). I think TRIPS is an awesome concept and its exactly the kind of project that I wanted to be a part of when I became a grad student at UT. I also know Steve Keckler because I'm taking his advanced computer architecture course this semester, and we're actually spending a good chunk of time talking about TRIPS (course schedule).
Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win