Slashdot Mirror


Next Generation Chip Research

Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."

14 of 174 comments (clear)

  1. Branching by shmlco · · Score: 2, Interesting

    The article states that this works by sending blocks of up to 128 instructions at a time to the processor, where "The processor "sees" and executes a block all at once, as if it were a single instruction..." Makes you wonder if they'd ever get close to that target, as IIRC, one instruction in seven on average is a conditional branch.

    --
    Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
  2. Loops as functions? by ReformedExCon · · Score: 4, Interesting

    We can understand easily how a loop could be calculated as a function, if the contents of the loop block is composed solely of calculations. When this occurs, the output of the loop is simply a function of its input (f(x), if you will). However, computer scientists who think that programs can always be reduced to a simple function with given inputs have their heads too far in their books to see how the real world forces programs to be far removed from that ivory tower gobbledygook.

    In the real world, you aren't typically performing calculations in loops. Rather, you are usually reading and writing to memory, which may or may not be cached. So it isn't just a matter of saying f(x), it is much more complicated and possibly dependent on memory which you have no way to determine until the loop iteration reaches that point. And then you'll still get the bottlenecks which plague us today. Memory isn't fast enough, devices aren't fast enough, too much time is spent waiting for I/O to complete.

    Pushing as much brute-force computation off onto compilers is fine. Let them unroll loops and optimize functions. But what are the limits to this? Can we really optimize our way to 1-step loops? I don't think so, but the DOD seems to think it is possible.

    --
    Jesus saved me from my past. He can save you as well.
  3. Re:pressing problems by shanen · · Score: 3, Interesting
    Small world, eh? A comment about the acronym, and my first reaction to the article was to remember TRAC, the Texas Reconfigurable Array Computer, which was something they were working on at the same school many years ago. Well, at least they didn't need "Texas" for the acronym this time, but I doubt anyone else remembers TRAC now.

    Disclaimer: In spite of having a degree from the school, I have a very low opinion of it. Yeah, it's large enough physically, and they had some oil money, but IMO they optimized towards narrow-minded mind-narrowing efficiency rather than breadth. Real education is about the breadth. Unfortunately, these days I feel as though my real alma mater seems to be following a similar path to mediocrity.

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  4. Boring by Rufus211 · · Score: 2, Interesting
    So glancing over the article it doesn't look like they're actually doing anything "new." Basically expanding on register renaming, speculitive execution, and the likes which making the cpu's job slighty easier to do it. Also their bit about data flow and "direct target encoding" sounds oddly like this patent by Cray from 1976 (!).

    Overal they might make some things marginally more efficient, but they aren't solving any fundamental problems. They're simply moving some around slightly.

  5. Reduction in register use by Cave_Monster · · Score: 2, Interesting
    FTA ... Finally, data flow execution is enabled by "direct target encoding," by which the results from one instruction go directly to the next consuming instruction without being temporarily stored in a centralized register file.

    This sounds really cool.

  6. Re:I don't get it... by ReformedExCon · · Score: 4, Interesting

    I alluded to this in my earlier post. Some mathematical operations are simply loops over a seed input. A summation is one example. You can reduce the calculation of a summation from a long series (infinite, perhaps) of functions executed in a loop to a single function which is valid for all inputs (voila, Calculus).

    So they say they can take loops in 128 blocks at a time and calculate the result in less than 128 loop steps. They are requiring the compiler to come up with a valid function for those 128 steps that will work for any initial parameters. If it works, it means that you are no longer executing 128 time, but only once. That is a speed-up of just over 2 orders of magnitude. Really, really amazing.

    But does it work? Can they really ask the compiler to do that much work? Is the compiler capable of being that smart? The main thing I wonder is how well this works, and how optimized it can get when the main purpose of looping is not to calculate functions but to access memory which is itself not fast.

    --
    Jesus saved me from my past. He can save you as well.
  7. VLIW (superscalar) ? by silverbyte · · Score: 3, Interesting

    IS it just me, or does this approach sound very similar to VLIW (http://en.wikipedia.org/wiki/VLIW) architecture. The problem is that the branch prediction needs to be very accurate, for any kind of performance boost.
    Which is why these types of architecture lend very well to sequences of operations that are very similar (video processing, etc.).
    Will this work just as well in the general-computing sphere? No idea.

  8. Call me bitter, but... by SoupIsGood+Food · · Score: 4, Interesting

    It seems to me any serious research into microprocessors will be hampered by the fact that it will be completely inapplicable unless it dumbs itself down to ape the x86 instruction set. All current and future processor design advances will be defined as better and faster ways of making modern silicon pretend it's a member of a chip family that was obsolete when the first President Bush was in office. That's not progress. That's just kind of sad.

    Heaven help any researcher if implementing their new chip design requires a new software paradigm that doesn't fit neatly into the OS/Application model, too. We're living in the perpetual now of 2000, and it's some boring shit. I want my future back.

    Bah.

    SoupIsGood Food

  9. Re:The article is too high level by Deflatamouse! · · Score: 2, Interesting

    This is so true. We have designs broken on paper that works perfectly fine in silicon. But of course, on paper we assume the worst case of most things and is probably overly pessimistic.

    What ends up happening is that parts are cherry picked before they're sold (with the costs passed down to the customers) or that the parts are binned and sold at different levels such as the case for Intel chips.

    Increasingly methods to improve yield rates drive some of the design decisions, sometimes even at the architectural level, especially as the processes continue to shrink.

  10. You've all got the wrong idea by Takahashi · · Score: 5, Interesting

    This is not some boring super scaler! Nor is it some vector processor!

    in fact this is a complete departure from a von Neuman architecture. The architecture is called a Dataflow architecture. In one sentence a dataflow architecture is one where instruction execution is based on the availability of the instructions inputs not a program counter.

    The article does a very bad job at conveying the fact that this is a relatively new idea. Like most reporting they report something thats been in research for some time as a huge breakthrough without describing it at all. Instead its really just an incremental step in dataflow computing research.

    I work in a lab at the University of Washington on another dataflow architecture. Its a really interesting idea but it will take some time to develop and you're not going to get one on your desk for some years to come.

  11. Re:I don't get it... by RootsLINUX · · Score: 4, Interesting

    I recommend you read this paper. It gives a great overall picture of what TRIPS is all about and is actually really cool. (I read it about a year ago).

    I am an ECE grad student at UT Austin so I know quite well of TRIPS. In fact I often speak with Doug Burger himself because he's the faculty advisor for the UT Marathon team, of which I am a member. (By the way, his name is "Burger" not "Berger"). I think TRIPS is an awesome concept and its exactly the kind of project that I wanted to be a part of when I became a grad student at UT. I also know Steve Keckler because I'm taking his advanced computer architecture course this semester, and we're actually spending a good chunk of time talking about TRIPS (course schedule).

    --
    Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
  12. Re:dumb question re: branch prediction by tlambert · · Score: 3, Interesting

    Correct prediction keeps your instruction pipeline full. This is particularly important for code with long pipelines.

    Incorrect prediction results in having to back out CPU state from the speculative execution that has already taken place (this is called "squashing" the mispredicted instructions), and effectively this loses the pipeline slots that were used to perform the mispredicted execution. From an outside perspective, these lost slots look like a pipeline latency.

    (insert rude comment about GCC #pragma branch hinting and [lack of] basic block reordering to avoid cache busting on PPC here)

    -- Terry

  13. A chance for pure functional languages to shine. by master_p · · Score: 2, Interesting

    Pure functional programming languages will see a tremendous boost from architectures like Trips. In functional programming languages, variables are never assigned, thus making it possible for all parts of an expression to be executed simultaneously. With 128 instructions, it is possible that lots of algorithms that take lots of time when executed sequentially, will take constant time with this new architecture: matrix operations, quicksort, etc.

  14. An easier to program Itanic-workalike? by UnapprovedThought · · Score: 2, Interesting

    What this is *not* in any form is a general purpose CPU.

    The article doesn't seem to agree:

    One of the big challenges to becoming a mainstream commercial processor is compatibility with existing software and systems, especially x86 compatibility, Moore says. But one way to maintain compatibility would be to use Trips as a co-processor, he says. "The general-purpose [x86] processor could offload heavy tasks onto the co-processor while still handling legacy compatibility on its own."

    So, it looks like they're trying to get Intel or AMD interested in producing a heterogeneous multi-core unit that includes their trippy core, in the hopes of keeping the number of cores (and their communications overhead) down to a minimum. Intel already has a form of (so-called) instruction-level parallelism with the Itanic, and it didn't work out too well (except maybe for crypto-heavy workloads). It's possible AMD will be mulling it over. One of the things they will have to worry about is whether a compiler can actually be written to use it, FTA:

    ... the Trips compiler sends executable code to the hardware in blocks of up to 128 instructions.

    With 128 instructions to schedule at once, that might provide a chance to actually keep all of the processing units on the chip busy. With the Itanic, it was really a challenge to do that, since you had to pull two floating point instructions out from somewhere in every clock cycle, something that not all workloads could accomplish, and I can see the compiler writers going crazy trying to produce some sorts of ultimately self-defeating hacks trying get that accomplished :)