Slashdot Mirror


Next Generation Chip Research

Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."

174 comments

  1. But... by ankarbass · · Score: 0, Redundant

    Is it reliable?

    --
    Wanted: Clever sig, top $ paid, all offers considered.
    1. Re:But... by Anonymous Coward · · Score: 0

      It's intelligent. Duh.

    2. Re:But... by freewaybear · · Score: 0

      Imagine a Beowulf cluster of...

      --
      Registered Linux User #404114 [url=http://www.punkoiska.com][img]http://img406.imageshack.us/img406/4379/posbannercf5.g
  2. pressing problems by ed__ · · Score: 5, Funny

    apprently, one of the pressing problems that chip designers are facing is coming up with stupid, meaningless acronyms.

    1. Re:pressing problems by shanen · · Score: 3, Interesting
      Small world, eh? A comment about the acronym, and my first reaction to the article was to remember TRAC, the Texas Reconfigurable Array Computer, which was something they were working on at the same school many years ago. Well, at least they didn't need "Texas" for the acronym this time, but I doubt anyone else remembers TRAC now.

      Disclaimer: In spite of having a degree from the school, I have a very low opinion of it. Yeah, it's large enough physically, and they had some oil money, but IMO they optimized towards narrow-minded mind-narrowing efficiency rather than breadth. Real education is about the breadth. Unfortunately, these days I feel as though my real alma mater seems to be following a similar path to mediocrity.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
    2. Re:pressing problems by Anonymous Coward · · Score: 0
      can someone loan me a spell checker please?

      These guys have one. Checks your spelling as you type. Comes with a bunch of other stuff too.

    3. Re:pressing problems by corngrower · · Score: 1

      Yes, it's called SMAD - Stupid Meaningless Acronym Deficiency.

  3. Is this simply a VLIW architecture? by Anakron · · Score: 4, Insightful

    It doesn't actually look any different. 128 instruction per "block" executed in parallel, just like a superscalar processor. This has been around since the time of the Pentiums (The pentiums weren't VLIW, though). What exactly is new?

    --
    There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
    1. Re:Is this simply a VLIW architecture? by nzkbuk · · Score: 3, Informative

      The thing that's new (if you read the article) is the instructions AREN'T executed specifically in a parallel fashon.
      They are executed in a JIT (just in time) fashon.
      currently with deep pipelines results can get stored in registers for a few cycles. this aims to execute instructions as soon as it can. That way it's needing alot less registers to store results.

      It's also meaning instructions are executed out of order AND in parallel in an effort to both increase speed and decrease chip complexity.
      If you don't have to use a transistor for storage / control, you can use it for the good bits, generating your answer.

    2. Re:Is this simply a VLIW architecture? by Takahashi · · Score: 4, Informative

      It really is different. Its not simply a super scaler. It's a data-flow machine. What this means is that instructions are arranged in a graph based on dependency and execute as soon as all inputs are ready.

      I work in a lab at the University of Washington where we are working on _implementing_ a different data flow machine that shares some of fundamentals with the UT machine.

    3. Re:Is this simply a VLIW architecture? by joib · · Score: 1


      I work in a lab at the University of Washington where we are working on _implementing_ a different data flow machine that shares some of fundamentals with the UT machine.


      So, which one is better? ;-)

      For a more serious question, I read the trips overview paper on their site and it all seems to make a lot of sense. So why aren't dataflow machines mainstream? The first papers were published in the early 1980s, not much later than risc started to make some noise.

    4. Re:Is this simply a VLIW architecture? by Goonie · · Score: 1
      I'm not an architecture expert, but it would seem obvious that all the requisite mechanisms to ensure that computations only occur when the inputs are available are a lot more complex in a dataflow machine. Having a central clock to ensure your timing steps are discrete makes ensuring this much easier (trivial with a non-pipelined chip, non-trivial but much easier with a pipelined CPU). It was probably simpler to just crank up the CPU clock, add extra execution units, and add cache.

      As chip fabrication processes have changed, presumably these equations have changed. The number of transistors available to manage all this data flow control has radically incrased ; at the same time, clock speeds have increased to the point where the time it takes for the clock signal to propagate across the chip is probably a limiting factor - at 4 GHz, light travels only 7.5 centimetres in the time it takes to complete a clock cycle. Therefore, to make things go faster, we eliminate the bottleneck of the centralised clock and take advantage of the fact we've got all these extra transistors available to sort the data flow constraints out on an as-needed basis.

      Hopefully somebody who actually knows what they're talking about can clarify my guess.

      --

      Any sufficiently advanced technology is indistinguishable from a rigged demo
      --Andy Finkel (J. Klass?)
    5. Re:Is this simply a VLIW architecture? by RingDev · · Score: 1

      My question is how do they handle Out of Order opperations? They mentioned in the article data flow and immediate execution after receiving inputs, which is great, but could lead of OOO output couldn't it?

      -Rick

      --
      "Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
    6. Re:Is this simply a VLIW architecture? by fitten · · Score: 1

      Summary of article:

      Extra Deep Out of Order Asynchronous Processor Developed

      I read the article, the only thing "new" was that it delivers a "block" of instructions at a time that is a bit bigger than current CPUs. They take this and combine it with standard asynchronous logic design and call it by a silly acronym.

      On a side note... I bet a page fault is pretty exciting ;)

    7. Re:Is this simply a VLIW architecture? by stevew · · Score: 1

      Sounds an awful lot like Transmeta???

      I don't see a whole lot of difference - they are using JIT techniques to get around the recompile for new hardware problems of VLIW. Beyond that it's just VLIW warmed over.

      Maybe they have some new ideas within the VLIW compiler space?

      In any case, I don't see it as revolutionary...more like evolutionary - and even then, just barely.

      --
      Have you compiled your kernel today??
    8. Re:Is this simply a VLIW architecture? by flaming-opus · · Score: 1

      multiple independant instructions passed to the processor at once, yeah, that sounds a lot like itanium to me. Regardeless of the "data flow magic no-register jump-and-shout" it should still face the same problems itaniums see: finding enough parallelism in the code to put in a common block. Pushing the complexity out to the compiler is only going to benefit you at all if the hardware was doing a poor job of extracting parallelism, but taking a broader, more dynamic look will reveal parallelism. Good in some cases, but there's a lot of times where there just isn't that much parallelism. One is still backed up on the fact that memory is REALLY, REALLY slow compared to pushing numbers between registers. /natch

    9. Re:Is this simply a VLIW architecture? by AuMatar · · Score: 1

      Out of order execution with multiple ALUs is nothing new either- both the pentium and Athlon lines have it. THe reason the Itanium was so dog slow was the original didn't have OOO execution- it expected the compiler to do it for them. So I still see nothing new here, other than scaling up of existing tech.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    10. Re:Is this simply a VLIW architecture? by eric_harris_76 · · Score: 1

      Sounds like the Control Data 6000 series machines, though no doubt done to a much greater degree and with more speed. (On those things you could count the transistors without using a microscope, after all.)

      They had a part of the CPU called the "switchboard" which kept track of which registers were in the process of getting results and which instructions were waiting on register values and which "functional units" (ALU parts) were in use calculating results. As instructions would complete and make their respective results and functional units available, other instructions would start up.

      Oh, and most of the A registers and their associated X registers were used to move data between the CPU and memory. So setting an A register to an address could result in a memory read (A1..A5) or write (A6..A7) for the associated X register. Complicated things a bit, I suppose.

      Now to actually RTFA to see if there is anything to add to the above. (Or apologize for.)

      -Eric

      --
      There's no time like the present. Well, the past used to be.
  4. No by laptop006 · · Score: 1

    All the apps keep tripping up...

    --
    /* FUCK - The F-word is here so that you can grep for it */
    1. Re:No by Anakron · · Score: 2, Funny

      Or perhaps it was the designers tripping..

      --
      There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
  5. Cue the... by Anonymous Coward · · Score: 0

    OMGWTFBBQ it's the University of Texas at Austin!

    1. Re:Cue the... by Krach42 · · Score: 1

      Pff... like anyone cares about University of Texas at El Paso.

      (Note: I'm alumnus of New Mexico State University, a rival school, so take this with a grain of salt.)

      --

      I am unamerican, and proud of it!
    2. Re:Cue the... by Anonymous Coward · · Score: 0

      Hey genius, it's the University of Texas at AUSTIN. So I guess New Mexico State University doesn't teach how to READ...

    3. Re:Cue the... by Krach42 · · Score: 1

      I'm commenting about the parent requesting for specification that this particular story happened in UT Austin. Such a clarrification wouldn't be necessary if there weren't other Universities of Texas in other cities. For example, El Paso.

      So, hey, Genius. I'm teasing him that of course it's UT Austin, because no one cares about UTEP.

      shit, why don't you try understanding posts, and implicit statements contained within them. Oh that's right. 90% of the slashdot crowd gets mad when you do that, because they usually imply stuff that they don't agree with explicitly, so they get mad when you do that.

      I do not care. If I said it and it implies something, then I implied it. Such as making reference to an alternate University of Texas, which is not at Austin, and saying that no one cares about it. Thus, my implied statement given above explicitly for the retarded.

      --

      I am unamerican, and proud of it!
  6. Branching by shmlco · · Score: 2, Interesting

    The article states that this works by sending blocks of up to 128 instructions at a time to the processor, where "The processor "sees" and executes a block all at once, as if it were a single instruction..." Makes you wonder if they'd ever get close to that target, as IIRC, one instruction in seven on average is a conditional branch.

    --
    Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
    1. Re:Branching by Anakron · · Score: 4, Informative

      Branches can be predicted with fairly high accuracy. And most new architectures have some form of speculation in the core. And they actually execute 16 instructions at once. Only their word is 128 instructions long.

      --
      There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
    2. Re:Branching by Nick+Nethercote · · Score: 1

      The instruction set supports predication, ie. conditional execution of instructions. It lets you do control flow without branches.

  7. but... by chiseen · · Score: 0, Offtopic

    can i cook eggs with my heatsink?

  8. Loops as functions? by ReformedExCon · · Score: 4, Interesting

    We can understand easily how a loop could be calculated as a function, if the contents of the loop block is composed solely of calculations. When this occurs, the output of the loop is simply a function of its input (f(x), if you will). However, computer scientists who think that programs can always be reduced to a simple function with given inputs have their heads too far in their books to see how the real world forces programs to be far removed from that ivory tower gobbledygook.

    In the real world, you aren't typically performing calculations in loops. Rather, you are usually reading and writing to memory, which may or may not be cached. So it isn't just a matter of saying f(x), it is much more complicated and possibly dependent on memory which you have no way to determine until the loop iteration reaches that point. And then you'll still get the bottlenecks which plague us today. Memory isn't fast enough, devices aren't fast enough, too much time is spent waiting for I/O to complete.

    Pushing as much brute-force computation off onto compilers is fine. Let them unroll loops and optimize functions. But what are the limits to this? Can we really optimize our way to 1-step loops? I don't think so, but the DOD seems to think it is possible.

    --
    Jesus saved me from my past. He can save you as well.
    1. Re:Loops as functions? by spuzzzzzzz · · Score: 1

      Can we really optimize our way to 1-step loops?

      Of course we can. Just have a look at your favourite functional programming language; it probably doesn't even have a loop construct. The question is whether this can be done efficiently. Of course, it also requires programmers to think in a different way, which they tend to be reluctant to do.

      --

      Don't you hate meta-sigs?
    2. Re:Loops as functions? by Anonymous Coward · · Score: 0
      have their heads too far in their books to see how the real world forces programs to be far removed from that ivory tower gobbledygook.

      Too bland. You should try to work more rhetoric and innuendo into your argument. It makes it so much more persuasive.

    3. Re:Loops as functions? by AuMatar · · Score: 1

      Just because the programming language hides the loop doesn't mean it isn't there. The processor itself is still execution a loop, wether you use a loop, recursion, or some sort of assignment concept.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    4. Re:Loops as functions? by spuzzzzzzz · · Score: 1

      Only because the processor instruction set is designed in such a way that loops are necessary. I'm not at all sure about this, but I wouldn't be surprised if Lisp Machines, for example, didn't have hardware support for loops.

      In order to see that loops are completely unnecessary, you only need to see that the lambda calculus is Turing complete.

      --

      Don't you hate meta-sigs?
    5. Re:Loops as functions? by AuMatar · · Score: 1

      TO see that they are completely necessary, you just need to understand processors at a gate level.

      You can only write so much data at once- the length of the smallest bus. To write more than that, you need to issue repeated write instructions. This means a loop of some sort, deciding how many write instructions to output. You can put all the fancy math you want on top of it- the hardware is implementing a loop.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    6. Re:Loops as functions? by spuzzzzzzz · · Score: 1

      To write more than that, you need to issue repeated write instructions. This means a loop of some sort

      No, it can be done as a recursive function. There will be no difference in terms of electric currents running across silicon because the logic involved is exactly the same. But if the processor presents a functional (ie. function-based) interface and uses a functional design, it will use recursion rather than looping to issue repeated instructions.

      the hardware is implementing a loop

      The hardware is implementing electric currents; the logic at that level is so simple that it doesn't deal with the concepts of loops or functions. It is only when you reach a higher level of abstraction that the difference between functional and imperative designs actually makes a difference. I would argue that the level at which the distinction becomes important is the processor ISA level; I also think the view that the distinction kicks in slightly lower is defensible but I don't agree that the logic at the gate level is important.

      --

      Don't you hate meta-sigs?
    7. Re:Loops as functions? by AuMatar · · Score: 1
      A recursive function is a loop, implemented in an extremely inefficient manner. You add/remove something to the stack each iteration of the loop instead of incrementing/decrementing a counter or twiddling a conditional variable. Its still a loop. You're still repeating the same code over and over. Thats the definition of a loop. You don't need a special keyword to implement it, using gotos to loop is still a loop (and a function call is just a goto with a stack push).

      I would argue that the level at which the distinction becomes important is the processor ISA level; I also think the view that the distinction kicks in slightly lower is defensible but I don't agree that the logic at the gate level is important.


      It goes up a few levels. In any case, the reason its important requires you to jump up a few posts. It doesn't matter if you use recursion or a loop or another mathematical construct- you will still be executing the same code multiple times. There's no way around it. You can hide the loop in math, but at a hardware level the machine will still have to do something multiple times, and that something multiple times won't be sped up any (at least not by the techniques discussed here. THings like increasing bus width can reduce it, but have other costs). None of the techniques in this idea will help optimize away loops, which is what the original poster asked. " Can we really optimize our way to 1-step loops?" The answer is no. Not on the hardware level.
      --
      I still have more fans than freaks. WTF is wrong with you people?
    8. Re:Loops as functions? by spuzzzzzzz · · Score: 1

      Oops, I think we've been arguing semantics the whole time. I would define a loop to be a logical construction like a "while" or a "for" in C. If it were written as a recursive function, I would not consider it to be a loop even though it executes the same instructions multiple times.

      There is a small difference, though between loops as we know them in imperative languages and recursive functions as we know them in functional languages: functions have no side effects, which opens up the possibility of optimisation at a high level. Since side effects in loops are the main obstacle to loop unrolling and other optimisation techniques, I don't think this difference is entirely insignificant.

      --

      Don't you hate meta-sigs?
    9. Re:Loops as functions? by AuMatar · · Score: 1

      We are in some ways. I view everything as the hardware sees it- if you're repeating the same instructions, you're looping. Probably why I'm a C coder at heart- I want to know whats really going on, anything that abstracts me from that is a hinderance. In the context of this question its what matters- we are talking about hardware optimizations in this article.

      You can unroll all you want, it still won't optimize the loop away. There's still a physcial limit to the amount of data transferable per operation . You can do all sorts of things to tweak loop efficiency, but you'll only shave off a portion of your original time.

      --
      I still have more fans than freaks. WTF is wrong with you people?
  9. Boring by Rufus211 · · Score: 2, Interesting
    So glancing over the article it doesn't look like they're actually doing anything "new." Basically expanding on register renaming, speculitive execution, and the likes which making the cpu's job slighty easier to do it. Also their bit about data flow and "direct target encoding" sounds oddly like this patent by Cray from 1976 (!).

    Overal they might make some things marginally more efficient, but they aren't solving any fundamental problems. They're simply moving some around slightly.

    1. Re:Boring by Monkelectric · · Score: 1
      So glancing over the article it doesn't look like they're actually doing anything "new." Basically expanding on register renaming, speculitive execution, and the likes which making the cpu's job slighty easier to do it. Also their bit about data flow and "direct target encoding" sounds oddly like this patent by Cray from 1976 (!).

      But they thought up a neat acronym for it, TRIPS! Seriously though, thats how research works ... Cynically we could say they are completely full of it. They also could have some new techniques to add, which if they did, is exactly what research is about. Most research is about incremental improvement at a leisurely pace... god damn I long for academia :)

      --

      Religion is a gateway psychosis. -- Dave Foley

  10. Isn't this what Intel tried to do with Merced? by Anonymous Coward · · Score: 3, Insightful

    I seem to remember that Intel designed Merced (now the Itanium, known colloquially as the Itanic to reflect how well it's gone in the marketplace) to shift the burden of branch prediction and parallelism to the compiler. Or, in other words, the compiler was expected to mark instructions that were capable of running in parallel, and also to state which branches were likely to be taken.

    All a great idea in theory; after all, the compiler should be able to figure out a fair amount of this information just by looking at the flow of data through the instructions (although it may not be so good at branch prediction; I'm not sufficiently strong on compiler theory and branch prediction to talk about that.) However, as can be seen by Itanium's (lack of) market success, the compiler technology just isn't there (or maybe we're using the wrong languages; there are, after all, languages that are designed to be inherently parallel.)

    If this team can get it working the way they want to, maybe -- just maybe -- Itanium will find its niche after all. But let's not kid ourselves; this is a hard problem, and it's more likely that they'll make incremental improvements to the knowledge that's out there, rather than a major breakthrough.

    1. Re:Isn't this what Intel tried to do with Merced? by Anonymous Coward · · Score: 0

      No, this is a good bit more than the Itanic. Intel threw all of the complexity on the compiler. The compiler for the UT machine can schedule instructions with dependencies at the same time and the dataflow logic will cause the execution to occur in the proper order. IIRC, the Itanic was more strictly a VLIW machine, ala the old Trace architecture.

      But the real question... will it run Windows? :-) Too, bad...

      But I'd like one to play with.

    2. Re:Isn't this what Intel tried to do with Merced? by Anonymous Coward · · Score: 0

      Isn't this what Intel tried to do with Merced?

      Don't you mean HP with Play-Doh?

    3. Re:Isn't this what Intel tried to do with Merced? by bushk · · Score: 1

      absolutely not. in fact, it's the opposite of a VLIW architecture (ex: Itanium). simply: VLIW: compiler specifies independence. TRIPS Datagraph Execution Model: compiler specifies dependence.

    4. Re:Isn't this what Intel tried to do with Merced? by ACPosterChild · · Score: 1

      or maybe we're using the wrong languages; there are, after all, languages that are designed to be inherently parallel

      But, are the problems parallelizable?

  11. Reduction in register use by Cave_Monster · · Score: 2, Interesting
    FTA ... Finally, data flow execution is enabled by "direct target encoding," by which the results from one instruction go directly to the next consuming instruction without being temporarily stored in a centralized register file.

    This sounds really cool.

    1. Re:Reduction in register use by Anakron · · Score: 1

      Yes it is. It's also in every good architecture textbook. Not new by any means. Maybe these guys actually DID do something new. However, the article is skimpy enough on details to be nearly worthless.

      --
      There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
    2. Re:Reduction in register use by Anonymous Coward · · Score: 0

      Register bypassing is in every architecture book, but not direct target encoding. Bypassing requires control logic to detect when a producing instruction has a consuming instruction somewhere else in the pipeline and to forward the operand to that instruction. The circuitry to accomplish this scales exponentially with the dispatch width of the processor. TRIPS encodes enough information in the instruction to forward the operand without any of this control logic. This is a major bonus in scalability and power consumption.

    3. Re:Reduction in register use by Anonymous Coward · · Score: 1, Insightful

      Having a routed network on which data can travel between function units without being merely copies of data assigned to a register file is not very mainstream (unlike say register bypass). It's not really a new idea, getting it to work well would be though.

    4. Re:Reduction in register use by Yokaze · · Score: 1

      Yeah. It sounds strangely similar to Tomasulos-algorithm. I can't believe that this is the main point of the new approach.

      How about linking to the frickin' homepage of the project

      --
      "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
    5. Re:Reduction in register use by Erich · · Score: 1

      Temprorary values can be determined by the chip dynamically, or encoded into existing instruction sets.

      --

      -- Erich

      Slashdot reader since 1997

  12. Is this simply an acronym generator? by Anonymous Coward · · Score: 0

    " What exactly is new?"

    A snazzy acronym.

  13. I don't get it... by i-Chaos · · Score: 1

    I don't get a word he says, and I know a little bit about programming. Can somone dumb this down?

    From what I know, a loop is a loop and you need to satisfy a condition and do some processing. Won't it be a problem if I don't have the data resulting from the last loop before I do the next one?

    --
    ...I am proof that intelligent beings are not always intelligent...
    1. Re:I don't get it... by ReformedExCon · · Score: 4, Interesting

      I alluded to this in my earlier post. Some mathematical operations are simply loops over a seed input. A summation is one example. You can reduce the calculation of a summation from a long series (infinite, perhaps) of functions executed in a loop to a single function which is valid for all inputs (voila, Calculus).

      So they say they can take loops in 128 blocks at a time and calculate the result in less than 128 loop steps. They are requiring the compiler to come up with a valid function for those 128 steps that will work for any initial parameters. If it works, it means that you are no longer executing 128 time, but only once. That is a speed-up of just over 2 orders of magnitude. Really, really amazing.

      But does it work? Can they really ask the compiler to do that much work? Is the compiler capable of being that smart? The main thing I wonder is how well this works, and how optimized it can get when the main purpose of looping is not to calculate functions but to access memory which is itself not fast.

      --
      Jesus saved me from my past. He can save you as well.
    2. Re:I don't get it... by freidog · · Score: 2, Informative

      only if the subsequent loops are dependant on data from the current loop.

      something like
      for(int i = n-1; i>0; i--){ n = n * i }

      obviously the new value of n depends on the value for n calculated by the last loop so that might not be a good candidate to try and parallelize. (actually factorial is something that can be written to take advantage of instruction level parallelism (ILP), I choose not too simply for the example).

      however, if you're doing something that is not dependant on previous loops, various forms of loop unrolling can exploit ILP.
      take for example blending two images
      for each row x and each column y, x++, y++
      imageTarget[x][y] = 1/2 * imageSrc1[x][y] + 1/2 * imageSrc2[x][y]

      one pixel does not depend on the result of the previous, there's no reason you can't do 2, 4, 8, 16 ect pixels inside each loop.
      Some compilers can take advantage of this already in doing loop unrolling to utilize MMX or SSE (or similar SIMD instruction sets) instructions. It seems like Trips is an instruction set designed to aid the processor in finding and exploiting such ILP.
      The usefullness of such massively parallel designs in general purpose computing is debatable I would say. On the whole there tend to be a lot more instructions with dependancies than those without. (obviously everything has some dependancies, I mean in such a manner that prevents ILP / loop unrolling).
      Hardware has been moving towards more parallelism with super-scalar and multi-chip processing and more functional SIMD instruction sets, but software has gone only kicking and screaming into a more parallel world.
      Athlon and Pentium 3, Pentium M can look at up to something like 14 x86 instructions and decode up to 3 of them per clock cycle. More often than not they can't find 3 suitable instructions to decode. I have a hard time believeing something is going to find 32 (16 per core, 2 cores on the prototype) for general purpose software.

    3. Re:I don't get it... by nzkbuk · · Score: 1

      Memory is the other thing they are trying to sort out.
      Or more specifically registers so instead of storing the results from an instruction in a loop while a different instruction executes, then having to access the registers to get the stored data, they execute the instructions as soon as the inputs are ready. so reducing the register (internal memory) count.

      When they do that, there are a while chunk of transistors which can now be removed from the design, or used for computation instead of storage

    4. Re:I don't get it... by kirinyaga · · Score: 2, Informative
      actually, as I understand it, the following loop :

      for(int i = n-1; i>0; i--){ n = n * i }

      is probably internally transformed into the following grid in a 10-instructions TRIPS processor :

      read n(transmitted as a & b) => decr a (transmitted as a & d) => comp a,0 => mul a,b (result transmitted as c)
      => decr d (transmitted as d & f) => comp d,0 => mul c,d (result transmitted as e)
      => decr f => comp f,0 => mul e,f

      where a,b,c,d,e & f are buses wiring the instructions-grid cells to each other. Each instructions-grid cell can be viewed as a little processor without register that performs the instruction it has been programmed for as soon data is present on its inputs.

      You can see in the previous example there is a fair amount of concurrence even with such a simple loop. The "new" thing is the loop unrolling is done by the hardware, not the compiler.
      --
      Kirinyaga
    5. Re:I don't get it... by RootsLINUX · · Score: 4, Interesting

      I recommend you read this paper. It gives a great overall picture of what TRIPS is all about and is actually really cool. (I read it about a year ago).

      I am an ECE grad student at UT Austin so I know quite well of TRIPS. In fact I often speak with Doug Burger himself because he's the faculty advisor for the UT Marathon team, of which I am a member. (By the way, his name is "Burger" not "Berger"). I think TRIPS is an awesome concept and its exactly the kind of project that I wanted to be a part of when I became a grad student at UT. I also know Steve Keckler because I'm taking his advanced computer architecture course this semester, and we're actually spending a good chunk of time talking about TRIPS (course schedule).

      --
      Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
    6. Re:I don't get it... by Krach42 · · Score: 1

      Someone else above alluded that the design is not a generic CPU, but rather a vector-like CPU, and thus more oriented towards DSP and HPC.

      In those cases, a lot of the functionality is well conforming to the features you speak of.

      Now, true, the average home PC probably doesn't do anything near close to what loop optimization they're talking about.

      That's the reason why most home PCs right now don't usually need dual procs (they don't usually execute multi-threaded apps), or HPC-oriented procs (like the Itanium. Which if you look at the GMP webpage, it rocks the pants off of processors at 3 to 4 times the MHz by a factor of 3 or 4. But still, for general CPU consumption, not so good.)

      --

      I am unamerican, and proud of it!
    7. Re:I don't get it... by Anonymous Coward · · Score: 0

      you are a fucking bitch faggot . FAGBURGER up your ass

    8. Re:I don't get it... by Alsee · · Score: 1

      I just read their spec paper. It *is* a genereric CPU and it will provide a decent boost to even a normal non-threadded non-parallel home application. Not a huge multiplier, but a better boost than the current deep pipelined fry-an-egg-on-your-CPU approach. It can dig out almost all of the potential concurrency that is hiding in even the most linear application.

      And if you *do* have a multithreaded system then one chip can run up to 8 threads in at once. And if you *do* have some heavily parallel code, like graphics or what not, then you can run a peak of 32 ops in parallel on a single chip (even 32 floating point ops).

      So it does look interesting. It's flexible, speeding up normal code and giving big multipliers when big parallel multipliers are availible.

      The prototype is to be 533 Mhz dual core with a 4x4 grid in each core. And that's at a klunky "old" 130 nanometer chip process. They intend the production chip to use a 35 nanometer chip process and be massively faster. The 35nm process can pack in the transitors at 13.8 times the density of the 130 nm process.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
    9. Re:I don't get it... by Alsee · · Score: 1

      You're obviously not going to run the 32 sub-units at full throttle on general software. The idea is that you can get 32 times the speed on the sort of CPU-killer parallisable code that desparately needs it, and you get run up to 8 threads in parallel if you have them, and you can speed up even the worst "general purpose" code to probably double or triple the speed of even the most sophisticated current CPU techniques by digging out far more intruction-level-parallelism. And you can do it without a blast furnace for a CPU.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
    10. Re:I don't get it... by Krach42 · · Score: 1

      Ah, thanks for the accurate representation. It does sound nifty the way you put it. :)

      Maybe in the future, I'll try and RTFA instead of trust other +5 informative comments. All too often on stuff like this you read opposing information about the technology. *sigh* It'd be cool if moderators would read the specs on stuff like this before moderating informative, so they'd know if it were informative, or disinformative.

      --

      I am unamerican, and proud of it!
    11. Re:I don't get it... by Alsee · · Score: 1

      Actually TFA was a bit light. I got my info from better links people posted. Here's a good source.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  14. It uses the LSD technology ... by Articuno · · Score: 5, Funny

    Bugs on the chip can lead to bad Trips

    --
    So Long and Thanks for All the Fish!
  15. In other news. by TCaM · · Score: 3, Funny

    Their NEXT next generations chips will be powered entirely by buzzwords and acronyms.

    1. Re:In other news. by Anonymous Coward · · Score: 0

      > Their NEXT next generations chips will be powered entirely by buzzwords and acronyms.

      How do they handle the thermal issues with excessive hot air and vapourware?

  16. TRIPS Project at UoT by Anonymous Coward · · Score: 1, Informative
  17. Some contradictions in TFA by Gopal.V · · Score: 4, Insightful
    > is that for application software to take advantage of those multiple cores, programmers must structure
    > their code for parallel processing, and that's difficult or impossible for some applications.
    >
    > "The industry is running into a programmability wall, passing the buck to software and hoping the programmer
    > will be able to write codes for their systems," he says.

    So you want the programmer to be unaware of the parallel processing. Then the article goes off and says something stupid IMHO.

    > a huge amount of control logic, control transistors that don't do any work -- they just consume power. Trips is trying to push some of that complexity back up into the compiler

    I thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions. As a hobbyist compiler dev, I'd like to note that the data flow architecture is the basis of all compiler optimizers (DAG), though the typical compiler dev is likely to use this input to allocate registers to minimize pipeline stalls. I admit that it can be done at the CPU level to some extent - then this is even stranger.

    > Trips compiler sends executable code to the hardware in blocks of up to 128 instructions. The processor "sees" and executes a block all at once, as if it were a single instruction, greatly decreasing the overhead associated

    Somehow this just shifts the hard work of peephole optimisation to the CPU to be done at real time. It would have been far better to do it in the compiler properly - something which needs extra memory and lots more processing than the code that is being executed.

    All in all, I don't see this thing revolutionizing General purpose programming systems. Though what I call special purpose programming might be the way the future of programming might go - I'm no Gordon Moore.
    1. Re:Some contradictions in TFA by Anonymous Coward · · Score: 0
      the article's a bit short on detail, 'big chunks of instructions scheduled by the compiler' sounds a lot like VLIW that was done to death back in the 90s so I guess there has to be more to it than that. Does sound to me like they've made bypass registers architecturally visible (another LIW trick from the 90s, or at least we did it then - doesn't architecturally scale well though, but if it's LIW it's what you get).

      Reading the article again I suspect it may be that they are going down the VLIW route - someone has to make it work one of these days :-) (by work I mean make a commercial product that flies) - from past experience the right thing to do is to go simple, short pipes, minimal inter-pipe logic, etc etc the usual good stuff

    2. Re:Some contradictions in TFA by Anonymous Coward · · Score: 0
      You should read this paper, linked by another poster. They are definitely designing this as a general-purpose processor.

      And they're explicitly working on making it work well with plain old C code. (I'm still wondering, though, if it might be advantageous to use languages with explicit dataflow support, like Oz.)

    3. Re:Some contradictions in TFA by Nick+Nethercote · · Score: 1
      I thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions.
      It's static placement, dynamic issue. So the compiler schedules the instructions onto the ALU grid. But then the hardware issues the instructions when their inputs are ready.

      A superscalar chip has dynamic placement and dynamic issue -- the hardware is responsible for deciding which ALU each instruction goes to, and when they get issued. Doing this well requires a lot of transistors and costs power, and the hardware is rediscovering inter-instruction dependencies that the compiler could have told it.

      A VLIW chip has static placement and static issue -- the compiler is responsible for deciding which ALU each instruction goes to, and exactly when it gets issued. This is really hard to do well due to variable latencies of instructions, eg. due to cache misses. It puts too much of a responsibility onto the compiler.

      TRIPS is static placement, dynamic issue. So the compiler schedules the instructions onto the ALU grid. But then the hardware issues the instructions when their inputs are ready. It's intended to find a good middle ground between superscalar and VLIW in terms of what the hardware has to do, and what the compiler has to do.

      Read the IEEE Computer overview paper http://www.cs.utexas.edu/users/cart/trips/publicat ions/computer04.pdf from 2004 for a much better idea of how it all works.

  18. They have CS Programs in Texas? by putko · · Score: 1

    Does any major piece of software that folks use come from UT?

    I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell ...

    But I can't think of a single one from UT. Not a single one. Is there something we all use that comes from UT?

    I know they have good petroleum engineering at A&M -- but I'm interested in CS.

    --
    http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
    1. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      It turns out that UT's Petroleum Engineering program is ranked higher than A&M's (that is, UT Austin is #1). UT also has the #7 Computer Science program (just a notch below the universities you mentioned). But, nice try anyway.

    2. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      We don't care about silly numbers published by some magazine written by high school graduates and journalism majors.

      Places like MIT and UC Berkeley are good not becuase they got some inane ranking in a magazine but becuase of their proven track record of innovative research that is unparalleled in the world. What has UT Austin done? I certianly can't think of anything.

    3. Re:They have CS Programs in Texas? by Deflatamouse! · · Score: 2, Informative

      Don't look down on the Texans. It has one of the highest ranked computer engineer programs in the country. I've heard of Doug Berger before and we have read his research papers and use his simulators (made between him and Todd Austin of Wisconsin) in our graduate classes at CMU (I'm BS&MS ECE, CS '01).

      Austin also has a high number of tech companies around - heck, AMD, IBM, Intel, Freescale, just to name a few. It's nicknamed Silicon Hills. UT may not have the legacies like that of MIT, CMU, Berkeley, Stanford, but they got a heck of a program going on there and they are catching up. Hook'em Horns!

    4. Re:They have CS Programs in Texas? by putko · · Score: 1

      Exactly. What has UT produced that folks use?

      MIT -- Kerberos
      Berkeley - RISC, BSD Unix, RAID, TCP/IP networking as standard OS feature
      Stanford -- RISC
      Cornell -- distributed systems research
      Caltech -- Carver Mead (VLSI, machine vision)

      UT -- ?????????

      --
      http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
    5. Re:They have CS Programs in Texas? by putko · · Score: 0, Redundant

      Don't look down on the Texans. It has one of the highest ranked computer engineer programs in the country. I've heard of Doug Berger before and we have read his research papers and use his simulators (made between him and Todd Austin of Wisconsin) in our graduate classes at CMU (I'm BS&MS ECE, CS '01).

      I didn't ask about how well their program is rated. Has UT produced any programs that people use? E.g.

      MIT -- Kerberos
      Berkeley - RISC, BSD Unix, RAID, TCP/IP networking as standard OS feature
      Stanford -- RISC
      Cornell -- Ensemble, Horus, Spinglass -- distributed programming toolkits
      Caltech -- Carver Mead (VLSI method, VLSI tools, machine vision)


      UT -- a simulator that you've used. What sort of simulator, please?

      E.g. Berkeley developed SPICE, a tool used to simulate circuits. Last I heard, it was the standard tool to use for that stuff. Here's the project page: http://bwrc.eecs.berkeley.edu/Classes/IcBook/SPICE /

      Austin also has a high number of tech companies around - heck, AMD, IBM, Intel, Freescale, just to name a few.

      I didn't ask what firms work there. I want to know what software people at UT have made that is worth talking about.

      --
      http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
    6. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      Interesting that you replied to yourself and answered some of your own questions. Karma whoring night? Multiple personality?

    7. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      UT -- a simulator that you've used. What sort of simulator, please?

      Simplescalar, the most widely used simulator for microprocessor research.

      Seriously, Kerberos? RISC? RAID? That's so 80's. UT has seminal results, but not in the _systems_ fields. More in the theory and AI field (i.e. ACL2). Yale Patt's group did the formal verification of Intel's Floating Point unit after the FDIV bug. I do research on computer architecture, and TRIPS has had significant impact in the architecture research community over the past 3 years.

      You forget that universities are not corporations, and there are not always going to be obvious products coming from research groups; research groups produce conference and journal papers, Microsoft and Google write software.

    8. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      Well, for example they developed ACL2 which is used by many chip companies to verify the very chips/cpu you are probably using right now: http://www.cs.utexas.edu/users/moore/acl2/

    9. Re:They have CS Programs in Texas? by spauldo · · Score: 2, Insightful

      Why wouldn't they have CS programs in Texas?

      What, you think all they teach at Texas univiersities is agriculture and oil-related subjects?

      Don't judge Texas until you've spent some time there. I hate the place, but I'm from Oklahoma where hating Texas is a requirement of citizenship.

      --
      Those who can't do, teach. Those who can't teach either, do tech support.
    10. Re:They have CS Programs in Texas? by putko · · Score: 1

      Thanks for reminding me of ACL2 -- I'd forgotten about that!

      I mention Kerberos, RISC and RAID because that's what people are using, right now, not because it is the latest and greatest.

      --
      http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
    11. Re:They have CS Programs in Texas? by putko · · Score: 1

      I've spent plenty of time in Texas.

      UT is a huge system. Upon reflecting on that and their relative lack of released software, I began to wonder if they'd made anything worth using.

      I forgot about ACL2, the only software project I've heard of that comes from Texas.

      --
      http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
    12. Re:They have CS Programs in Texas? by putko · · Score: 1

      No, I haven't answered myself. I don't have multiple personalities, and it is not karma whoring night.

      Try reading 'flat' and then it will make sense.

      --
      http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
    13. Re:They have CS Programs in Texas? by Deflatamouse! · · Score: 1

      UT -- a simulator that you've used. What sort of simulator, please?

      It's called SimpleScalar, a superscalar microarchitecture simulator. We have developed trace cache simulators with it in '98-'99 among other things. (Pentium 4 implementation wasn't that great however, it was a high cost cache anyway.)

      Most of the technology that you mentioned were developed 2 decades ago. Their pervasiveness today reflect the years of research and development that has gone into it. RAID was an idea developed by Patterson in the early 90's or late 80's, but only in the last few years can you build RAID arrays cheaply. Some great ideas only made it into the market and the general masses after years of development. Ask yourself, can you name any research, off the top of your head, from the schools that you listed.

      UT has a great deal of research going on - Wireless, Computer Vision/AI, verification, etc...
      Not to mention countless alumnis' making impact in the industry. Intel in Hillsboro, OR has a large team of UT alumni, HP in Dallas and in Colorado as well, and of course the firms in Austin that I mentioned.

      Keep up with research papers and what's going on around the industry before you 'judge' a school's worthiness based on your own little world of BSD's and linuxes.

      And actually, you're not quite right about SPICE either. The idea may have originated from Berkeley, and I am not trying to discredit them, but most of the industry have moved on to using tools from Cadence, Synopsis, or develop their own tools that match their own needs.

    14. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      Columbia -- Kermit

    15. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      Dude, someone named Dijkstra taught in Texas. Look him up.

      Other people teaching in Texas: the Boyer and the Moore in the Boyer-Moore fast string search algorithm, the guys that developed SNP (the precursor to SSL), the Hoard multiprocessor memory allocator...

      Then there's this guy named Stroustrup...

    16. Re:They have CS Programs in Texas? by Anonymous Coward · · Score: 0

      Well, Hoard memory allocator is one thing at least (http://www.hoard.org/).

  19. The article is too high level by Anonymous Coward · · Score: 1, Insightful
    When it comes to cpu design, such high level articles convey no information at all. It is akin to saying that I'm designing a cpu with 17 pipeline stages, 47-bit instruction words, 713k of L1 cache and 12 general purpose registers... What does all this tell you ? Precisely nothing because it all boils down to what frequency this chip will run at once the design is turned into transistors, how much current it will draw, etc. And this is not something people without vlsi design experience can speculate about.


    1. Re:The article is too high level by lliinnuuxxlover · · Score: 1

      Trust me, most of times, even a VLSI designed cannot accuarely predict the final frequency/power

      --
      This Post was entirely made up of recycled electrons making up recycled signals to generate recycles ASCII to generate t
    2. Re:The article is too high level by Deflatamouse! · · Score: 2, Interesting

      This is so true. We have designs broken on paper that works perfectly fine in silicon. But of course, on paper we assume the worst case of most things and is probably overly pessimistic.

      What ends up happening is that parts are cherry picked before they're sold (with the costs passed down to the customers) or that the parts are binned and sold at different levels such as the case for Intel chips.

      Increasingly methods to improve yield rates drive some of the design decisions, sometimes even at the architectural level, especially as the processes continue to shrink.

    3. Re:The article is too high level by KillerBob · · Score: 1, Insightful

      What does all this tell you ? Precisely nothing because it all boils down to what frequency this chip will run at once the design is turned into transistors, how much current it will draw, etc.

      You're a product of Intel's marketing. AMD has been able to consistently produce systems that meet or beat Intel's performance with half the clock speed, because they have better instruction pipelining. (if only they could fix their manufacturing problems....)

      Frequency amounts to squat in the final evaluation. Sure, it makes a small difference, but what makes a significantly bigger difference is how well the instructions are optimized, how much of the really fast RAM there is (L1, L2, and L3 cache are basically just RAM that's *way* faster than your system memory). Of course, even with a perfectly optimized CPU, you could still encounter performance issues if you have a crappy system bus and chipset. It's the combonation of all of these things together that makes for a faster system.

      Oh, and the power draw has very little direct effect on the performance of your system. It affects the heat buildup, which affects how often and how fast your fan needs to run, which in turn affects the noise your system generates. In some cases, it can also affect the EM interference within the computer itself, though admittedly nowhere near as much as the PSU itself and any moving components like floppy, HDD, and CDROM do. As for the actual performance of a system, having a lower power draw means that with the same cooling setup, you can run at a higher clock speed, but as we've established, there's other ways to improve the system performance than increasing the clock speed, and they have a much bigger impact.

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    4. Re:The article is too high level by Anonymous Coward · · Score: 0
      This has nothing to do with Intel marketing. If frequency amounts to squat in the final evaluation as you say, why don't you just underclock your 2GHz AMD (or Intel) CPU to 700MHz ? Because of its architecture, AMD chips perform well at a lower frequency, but there is still a frequency that they need to reach to be competitive. If it runs at 50MHz, it doesn't matter how wonderfully it's architected.


      And what makes you think that I was concerned about its power draws impact on performance ? I was not! Just remember Steve Jobs' "performance per watt" statement. Power draw matters to "people".

    5. Re:The article is too high level by corngrower · · Score: 1

      Judging from the responses to the article, the article was already written at a level that was too high to be understandable for a sizeable portion of the slashdot crowd. Many of them do not appear to understand what dataflow computing is.

  20. Obligatory question by Ray+Alloc · · Score: 0

    Does it runs windows ?

    No joke, I'm sure some asshat will eventually ask that one seriously.

  21. Old idéas? by Anzya · · Score: 2, Insightful

    This looks to me to be a combination of old and not so good idéas.
    I have read about out of order execution and using data when ready at least 5 years ago in Hennesy and Pattersons book "Computer Architecture A Quantitative Approach". To me it sounds like a typical scoreboarding architecture.
    And how he can claim that this will lead to less control logic someone else might be able to explain to me.
    As for executing two instruction at once since their destination and value are the same sounds like a operation that will lead to more control logic. Besides doesnt most compilers optimize away these kinds of cases?

    --
    "This message was brought to you by Sarcasm and Troll Feeders United (or STFU, for you un-hip people)."
  22. VLIW (superscalar) ? by silverbyte · · Score: 3, Interesting

    IS it just me, or does this approach sound very similar to VLIW (http://en.wikipedia.org/wiki/VLIW) architecture. The problem is that the branch prediction needs to be very accurate, for any kind of performance boost.
    Which is why these types of architecture lend very well to sequences of operations that are very similar (video processing, etc.).
    Will this work just as well in the general-computing sphere? No idea.

    1. Re:VLIW (superscalar) ? by bushk · · Score: 1

      similar in only the following: an effort to provide higher instruction level concurency while minimizing hardware control logic. exactly oposite as follows: VLIW instruction words articulate independence. TRIPS instructions articulate dependence. hardly a conventional superscaler. TRIPS operates on an explicit data graph execution paradigm.

  23. Re:Boring (article, not project) by Rufus211 · · Score: 4, Informative
    So after looking into their project page I realized I actually saw a presentation given by these people last year. The article makes this sound like something it completely is not. Basically it's a grid of functional units that can connect to their neighbors. You "program" the chip by telling node 1 and 2 to take inputs and invert them, then feed the output to node 3, which then multiplies the two inputs. Really it's a glorified DSP that has some interesting programmability. Their code analyzation to generate the DSP code and then schedule it across a 3d matrix (2d function array x time) will certainly be interesting.

    What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple. This is for doing stream data processing such as compression or HPC simulations. I seem to remember in their presentation showing a prototype doing software-radio at a data rate usable for 802.11.

  24. dependent by DrSkwid · · Score: 1

    dependent, dependent, dependent

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  25. TRIaPS not TRIPS by Anonymous Coward · · Score: 0

    Brooklyn, the FAMOUS anonymous poster writes:

    "Teraop Reliable Intelligently Adaptive Processing System"

    That's TRIAPS, not TRIPS. :)

    1. Re:TRIaPS not TRIPS by Anonymous Coward · · Score: 0

      Why isn't anyone bumping this? Nazis!

  26. Maybe Linux.... by fprog26 · · Score: 1

    >Does it runs windows ?

    Nope, but it runs on Linux tough!

    ROTFL

    No Joke, but you could:

    1. Volunteer yourself,
    2. Buy this Titanic II TRIPS chip,
    3. Port GCC to it,
    4. Compile Linux,
    5. Be an hero!
    ???
    6. Sorry, No profit.

    1. Re:Maybe Linux.... by nsmike · · Score: 1

      Um... Wouldn't changing the instruction set make anything written for an x86 instruction set unusable on the Trips architecture? Unless the new Trips instruction set INCLUDES x86 instructions as well as their own special instructions?

  27. Captain Trips? by Anonymous Coward · · Score: 0

    Trips ... check
    Texas ... check

    Yup, We all know now they're producing bioweapons... Stephen King is a goddamn prophet.

    Seriously, "The stand" is a good book, but not realy great (like "it").

  28. Re:Bad Trips ... by amcdiarmid · · Score: 1

    The article alludes to executing large numbers of executions simultainiously. Like creating new pathways in the brain that make certain modes of thought more efficient. If it works the shortcuts will avoid many program loops that would normally take processing time and make the trip shorter.

    I suppose the whole thing will have to be ACID compliant;)

  29. When are we gonna actually see this? by kerohazel · · Score: 1

    I wonder how long it's going to take these innovations to catch on in mainstream computing? Given that most desktops are still running on architectures burdened by 30-year-old design practices... I'd just like to see RISC finally embraced to the degree it deserves. That alone would certainly open up a lot of innovative designs that aren't feasible with the x86.

    --
    Skype is too convoluted... Now I'm reverse-engineering the Kyoto Protocol.
  30. Parallel processing by pubjames · · Score: 4, Insightful

    I had an interesting discussion with a chip designer the other day. We were talking about parallel processing, and I spouted the usual perceived wisdom "But isn't the problem with parallel processing that many problems are very difficult or impossible to do in parallel? And isn't programming in parallel really difficult?"

    I found his answer very interesting, something like "that line of thinking comes from when computers weren't fast enough to do the basic things we wanted to do with them to do then. It's true, an application like a word processor is not a good problem to tackle with parallel processing - but we don't need to these days. Nearly all the stuff we want to do today - faster graphics, 3D video image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going - all of these problems are idea for parallel processing. What Google does - that's essentially parallel processing, isn't it?"

    That kind of changed my perception of things and made me realise my mindset was way out of date.

    1. Re:Parallel processing by TecKnow · · Score: 1

      You're not running Google's backend at home, are you? The carefully cherry-picked sample of tasks provided doesn't represent an accurate cross section of the things computers are used for. Your car's engine isn't doing a lot of sound processing, is it? Your cell phone isn't doing a lot of 3D graphics?

      I won't try to downplay the application of paralell processing in many profesisonal or academic situations, but the demands of consumers and realtime systems have a few things in common, they both seek to optimize response to relativly simple yet incredibly arbitrary tasks. Paralell processing is next to useless for that.

    2. Re:Parallel processing by pubjames · · Score: 1

      Sorry, but you're just exhibiting the kind of mindframe that I used to have.

      Your car's engine isn't doing a lot of sound processing, is it? Your cell phone isn't doing a lot of 3D graphics?

      No, but parallel processing aren't needed for those tasks.

      The carefully cherry-picked sample of tasks provided

      So you think "faster graphics, 3D, video, image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going" is a cherry picked list?

      The point is, the tasks that we have these days that require massive amounts of processing are nearly all very suitable for parallel processing. The examples you give don't require massive amounts of processing and as you say, parellel processing is not a good solution for.

    3. Re:Parallel processing by TecKnow · · Score: 2, Informative
      Keep in mind that Google doesn't rely on the newest, fastest chips, they instead rely on numerous inexpensive paralell systems. Read their tech page if you don't believe me. http://www.google.com/intl/en/corporate/tech.html so "processing massive amounts of data on the web" doesn't really apply. In fact there are no or nearly no problems in computer science that benifit from using multiple cores or CPU's in a way that can't be replicated using a cluster or grid approach. While advances are always welcome, this area doesn't exactly cry out for hardware innovation.

      Paralellism-on-a-chip doesn't let us do anything we couldn't already do, and most applications that benifit from it are outside the domain of the general consumer. Faster graphics, sound, sure these things might benifit from on-a-chip paralellism, but consider how many PCs the average consumer has. Now consider the number of embedded processors they have weather they know it or not. Their vehicle, the HVAC system in their home, cell phones, radios, televisions and so on. Clearly, embedded processors vastly outnumber PC processors and, as I said, essentially none of these benifit from paralell computing.

      Now lets consider the benfits of hardware advances in embedded systems/realtime technologies. The smaller and faster a DSP chip can be the smaller your cell phone can be, and the more information can be packed into a limited singal bandwidth, just as one example, Sounds good to me.

      Now lets consider the benfiits of hardware advances in paralellism-on-a-chip. very few because corporations can string together many cheap PCs while outside of video games consumers don't benifit much from paralellism.

      Considering the availablity of a cheap and effective substitute to paralellism on a chip, the relative prvalance of embedded systems, and the difference in potential gains from advances in each field, yes, I would say that any list that entirely discounts embedded/realtime systems is 'cherry-picked.'

    4. Re:Parallel processing by pubjames · · Score: 1

      Sorry, you don't seem to understand my point. Forget it.

    5. Re:Parallel processing by tepples · · Score: 1

      Your car's engine isn't doing a lot of sound processing, is it?

      But couldn't it do something like signal processing in order to use fuel most efficiently?

      Your cell phone isn't doing a lot of 3D graphics?

      Even if it's an N-Gage?

    6. Re:Parallel processing by CaptainFork · · Score: 1
      I would go further than that.

      Even within apps like word processors, the slow parts are inherantly optimisable via parallelism. For example, rendering the screen display is certainly parallelisable (eg word-wrap is an independent problem for each paragraph). The "behind-the-scenes" slowness of many apps is due to dynamic linking, which is also parallelisable.

      The problem is that none of the frameworks for parallel programming are used by apps writers, and none of the apps writers' frameworks address basic efficiency considerations, let alone parallelism.

      Maybe embarissingly parallel (and relatively simple) cases like sound and grpahics will drive the required software frameworks, which can then be applied more widely.

      The question we really should be asking is, what is the best abstraction of parallel hardware: it should apply to as much actual hardware as possible, but complicate software development as little as possible.

    7. Re:Parallel processing by rufty_tufty · · Score: 1

      Name me an embedded system that can't benefit from parallellism?
      My mobile phone is already parallel as it has 2 processors in it plus a smeg load of hardware computation assist. So cell phones already do benefit from this and could benefit from more. Look at the work xilinx is doing with TI into getting into base stations and you'll see how mobile phone algorithms can be parallised.
      A printer could certainly benefit from parallelism, get it right and you could render every character on the page at the same time. And we know how easy other graphics on the page can be processed in parallel.

      TVs? Well much of the processing is done in hardware (well it is on the TV chip I'm designing at the moment) so already parallel, the processor is a multi threaded processor to aid this. So TVs do benefit from this already and could benefit from it more.
      Radios? Again the last chip I designed was a DAB Radio chip and again that was a multi-threaded processor with hardware assist.

      I can't think of any other embedded systems that are cpu constrained that couldn't obviously benefit from mass paralleism. If the current chips already benefit from this, how can you say that they can't?

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    8. Re:Parallel processing by TecKnow · · Score: 1
      The short version: What I'm really arguing here is multi-core single processors are not fundamentally sperior to simply having two chips working in paralell, and that two chips or cores working in paralell are never as good as a single chip that simply goes twice as fast. Having multiple chips in a single embedded device only goes to support that argument. If you can think of a case where two identical processors working together can solve a problem faster than a single chip that simply goes twice as fast, please tell me.

      Okay, first lets get our definitions straight. Lots of things have multiple chips on them, and yes, that's very important, but it isn't paralellism in the sense being discussed unless the chips work simultaneously, not sequentially, and nearly interchangably. The presence of a DSP chip and a more general chip to handle menus and so forth in a cell phone (as one entirely hypothetical example) does not denote paralell computing because they're not, fundamentally, working on the same problem, nor can they easily switch roles. In the sense being discussed here, your CPU isn't paralell processing with your hard drive controller, or even your FPU, because they're specialized parts of a whole, not replicaitons of a funcitonal unit.

      Further, threading it neccecary for pralellism, but doesn't inherently mean that an application will benifit very much from it. The overhead associated with interprocessor communication means that the increase in speed you get with additonal processors is always at least slightly diluted, and due to threading management issues, you often end up waiting on other processors/cores. It may get you the anwser to pressing quesitons faster, but only at the cost of lower utilization, which pretty much equates to wasted chip space.

      Finally, none of that is even what is being discussed here. The issue is paralellism-on-a-chip, or multi-core processing. You havn't provided a single example where a dual core chip is prefferable to a single core chip that simply goes twice as fast. As I said, there are overheads associated with multiprocessing even in an ideal situation (which you almost never, if ever, have) so multi-core paralellism can never even be as good as a single core chip that goes twice as fast.

      I realize there are significant technological hurdles to making faster and faster single core chips, but as you said yourself, existing embedded hardware such as your cell phone can already accomodate multiple chips anyway. So far as I know multi-core chips are simply a sort of appology. "We can't make the underlying chips faster, so here, we'll incorproate the work-around most people would use anyway upfront."

      I stand by my original assertion (which I think has been lost) multi core processors do not represent any sort of fundamental advance. That is all.

    9. Re:Parallel processing by rufty_tufty · · Score: 1

      The Short version: I agree with many of the points you make, but I think it's over simplified.

      Let me take the main points as this could otherwise drag out...

      "As I said, there are overheads associated with multiprocessing even in an ideal situation (which you almost never, if ever, have) so multi-core paralellism can never even be as good as a single core chip that goes twice as fast."

      I can see where you're comming from, but I think you underestimate the cost that context swapping has. Let me define some terms:
      I would say that most algorithms can benefit from parallel processing
      I would say that most systems are built from many algorithms that can be handelled in parallel.
      With correct design, mindset amd tools software can be written to take advantage of this massive parallelism.
      We are much more limited by clock speed than by area in modern processors.
      Memory access in modern processors is the main bottleneck for many algorithms.
      Multi-threading in a single core makes better use of one memory because while one task is stalled on data from memory, the processors resources can be used for another thread that has its data sat in cache.
      I refer to a thread as a very ambiguous concept, it could be a whole chip, it could be an execution context on an individual core.

      If you had as many cores as you had x tasks and all tasks required the same ammount of processing (very unrealistic) then there would be no task swaps. If you then put this same problem onto a prosessor that was x times faster but only had 1 core then it would be slower because of the task swapping. How slower would depend on how often it had to swap.

      Inter process communication which you say is a problem is indeed a problem, but not for all algorithms on all architectures. Take a network processor, often there are dozens of processors but the architecture is designed for this so there is lots of communicarion between the cores; so interprocess communication is not an issue.
      As another example of inter-process communication take the transputer - that was even better as it had next to no cost for a context swap too.

      Give me an example of a task an embedded device that is processor limited that couldn't be improved by more memory speed. I'll bet that this could therefore be helped by splitting the dataset up into multiple streams and running these on separate cores.

      I really think all the problems you/we state are flaws in the current dominant architecture, not real limitations as I have worked with systems that have ways round them all.
      Cranking up the clock speed has already hit a hard limit, like it or not if we want more speed, we have to think parallel. This is not a cheat, most systems are already parallel, so why not make use of this. Why go to an effort to serialise a problem just to put it on a single thread processor that could be a 10 thread processor.

      But yes it would be nice to have a 5Ghz machine on my desktop, but it's not going to happen any time soon. simply to keep cranking up the speed is a very simple solution to the problem, but that's hit a limit. The free lunch is over, you may want this to happen, but magic is for what people want, engineering and science deals with what is.
      I suggest a read of this:
      http://www.gotw.ca/publications/concurrency-ddj.ht m
      as it puts that argument far better than I could.

      Multi-core may not be a fundamental advance, but you give me a single fundamental advance in computing since the 60s?
      hey the 4040 wasn't a significant advance, as all it did was combine the separate parts into a single chip which had years of precidence anyway, so that's not a significant advance; the microprocessor was not a significant advance by those rules.
      I think we can only judge what is a significant advance in retrospect and so far I'm seeing the mainstream thinking moving towards a more parallel one (which is what is happening) as being the f

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    10. Re:Parallel processing by TecKnow · · Score: 1
      Thanks, I did, in fact RTA you provided. You did anwser my request for an exmaple of a situation where paralellism beats a single, faster chip, and you're right paralell processing does help accomodate the memory bottleneck. I'll consider multi-core paralellism a significant advance when hardware and compilers have developed to the point that, at the very least, threading doesn't change the fundamental contract between a developer and their hardware about the way their code is going to at least appear to behave.

      Right now, it does. Even an experienced paralell programmer is in most instances just tries to separate functionality into independant threads as much as possible and throw locks around where needed to try and get the program to behave in anything resembling a predictable way. That's far from ideal, and what's worse, automated tools to locate the points where a threaded programmer may have made a mistake, where paralellism can sneak in and break the programmer's expectations of code, making it inherently unpredictable, don't exist in a form useful for realistic systems. Current tools in common use for Java (the only widespread language with built-in paralell processing suport) top out at only a few hundred lines of code, and due to the fact that possible paths of execution increase exponentially with each additional line, they won't be getting better any time soon.

      The lack of automated testing and analysis tools just underscores that no one has yet developed a simple, repeatable method of producing efficient, safe, predictable multi-threaded programs. That's not engineering, that's art, and it's unpredictability make it a black art, at that.

      I look forward to the day when compiler technology takes advantage of paralellism in a way that can efficiently paralellize arbitrary code that was written in an implementation agnostic way, or at least automatically locate and flag potential flaws, but as the article also pointed out, Java is the only language with built-in concurrenty support and its not particlalry good, let alone automated.

      Concurrent programming isn't mature. Widespread proliferation of multi-core chips is going to force this technology on rank-and-file developers who are going to look to the eggheads for tools and a mehodology to use them, and the eggheads are going to have to say "we don't have it."

      These kinds of chips don't represent a move towards becoming multithreaded, they represent a desire to be multithreaded. What was that you said about magic again?

    11. Re:Parallel processing by rufty_tufty · · Score: 1

      With you all the way there :-)
      Still this isn't my favorite way to do parallel processing - something like the approach
      http://www.celoxica.com/
      use (with FPGAs and C-like parallel languages) is my favorite, but still not mature and going no-where near the desktop yet. Although one group has got linux synthesised into hardware which you have to admit is cool :-)

      Seems like we have the worst of all worlds at the moment, if we could just get a hardware platform that was easy for the compler to map an inherantly parallel language onto and then re-train all the worlds programmers to use it :-)

      multi-core processors may be a way to this, they may not, I'm just glad that we're finally on the route to parallelism.

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
  31. dumb question re: branch prediction by cyclomedia · · Score: 1

    while I'm reading TFA could someone explain why branch "prediction" is such a big sticking point in CPU architecture... surely a processor has the compiled code and a bunch of data, it doesn't need to predict anything because it's all laid out. and by that i mean "for... if... break;" processor shouldn't be surprised when it gets to that nested if and reaches the break and has to jump out of the loop cos it's clearly there in the code to start with, it's not like it just magically showed up, is it.

    --
    If you don't risk failure you don't risk success.
    1. Re:dumb question re: branch prediction by TecKnow · · Score: 3, Informative

      Sure, I can try. All of this stuff about branch prediction is basically the result of something called 'pipelining.' The rational for pipelining goes something like this: an instruction on a modern computer chip is executed in several stages (fetch, decode, execute, and writeback, in an iconic sense) For any particular instruction you can't begin one stage before you've completed the previous stage. Different stages require different hardware on the chip, so in a non-pipelined CPU some parts of the chip are just sitting there much of the time, that is bad. The reigning solution to resolve this is pipelining. Each of the stages I listed above is segregated, and as an instruction exits one stage, another instruction begins that stage. This is all well and good except, what happens if the instruction being decoded depends on the results of the instruction being executed? The results are unknown, so do you sit and wait? You can get around this problem somewhat by complicating the chip a little to feed the results of in-process computations back to later instructions in the same pipeline that require them. But now you've got a branch, and you can't even tell what instruction to load next until you know what the condition on that branch is going to evaluate to, the best a chip can do in this case is guess (branch prediction) but if you're wrong you have to throw out all the speculative computations you did. Modern processors rely heavily on pipelining so an incorrect guess can set them back significantly, especially if they make a habbit of it.

    2. Re:dumb question re: branch prediction by tlambert · · Score: 3, Interesting

      Correct prediction keeps your instruction pipeline full. This is particularly important for code with long pipelines.

      Incorrect prediction results in having to back out CPU state from the speculative execution that has already taken place (this is called "squashing" the mispredicted instructions), and effectively this loses the pipeline slots that were used to perform the mispredicted execution. From an outside perspective, these lost slots look like a pipeline latency.

      (insert rude comment about GCC #pragma branch hinting and [lack of] basic block reordering to avoid cache busting on PPC here)

      -- Terry

    3. Re:dumb question re: branch prediction by corngrower · · Score: 1

      The 'branch prediction' problem isn't a matter of predicting whether or not a branch will occur, it's predicting which of the outcomes the branch will take. The problem occurs in CPUs that are pipelined. In essence when one instruction is just completing, the processor has already started working on 5 or more subsequent instructions. Now the problem is that in order to this, it must 'predict' the outcome of a branch instruction. If it gets this prediction wrong, the work thats been done on those subsequent instructions that were already started has to be thrown away, and it's got to start working on the instructions from the correct branch.

  32. TRIPS Homepage and original announcement by citanon · · Score: 3, Informative
  33. Call me bitter, but... by SoupIsGood+Food · · Score: 4, Interesting

    It seems to me any serious research into microprocessors will be hampered by the fact that it will be completely inapplicable unless it dumbs itself down to ape the x86 instruction set. All current and future processor design advances will be defined as better and faster ways of making modern silicon pretend it's a member of a chip family that was obsolete when the first President Bush was in office. That's not progress. That's just kind of sad.

    Heaven help any researcher if implementing their new chip design requires a new software paradigm that doesn't fit neatly into the OS/Application model, too. We're living in the perpetual now of 2000, and it's some boring shit. I want my future back.

    Bah.

    SoupIsGood Food

    1. Re:Call me bitter, but... by gklyber · · Score: 1

      I have hope for the future. If Google manages to trump Microsoft as the defacto standard application provider, I think we will see less focus on the OS. Web apps and standards-based software architectures could lead to a day when the underlying OS and architecture is not important. This is already the case for most POSIX-based stuff.

    2. Re:Call me bitter, but... by corngrower · · Score: 1

      That's just a load of bunk. Let's see, how many ARM based 32 bit microprocessor were made last year? -- Over 500 Million, kind of puts x86 sales to shame. They saw 278 Million units in sales in one quarter last year. If you think everythings x86, you've just got your head in the sand.

  34. You've all got the wrong idea by Takahashi · · Score: 5, Interesting

    This is not some boring super scaler! Nor is it some vector processor!

    in fact this is a complete departure from a von Neuman architecture. The architecture is called a Dataflow architecture. In one sentence a dataflow architecture is one where instruction execution is based on the availability of the instructions inputs not a program counter.

    The article does a very bad job at conveying the fact that this is a relatively new idea. Like most reporting they report something thats been in research for some time as a huge breakthrough without describing it at all. Instead its really just an incremental step in dataflow computing research.

    I work in a lab at the University of Washington on another dataflow architecture. Its a really interesting idea but it will take some time to develop and you're not going to get one on your desk for some years to come.

    1. Re:You've all got the wrong idea by mkramer · · Score: 1

      I expect it'll be longer than a few years before you see it on your desk, because dataflow processors don't really solve desktop computing problems.

      However, you will see them in space, on aircraft, in missiles, GPS units, etc. in the next few years, because they are very attractive solutions to real-time front-end processing problems, especially where size and power consumption is a major concern. Hence why TRIPS is being paid for by DARPA (along with several other dataflow - some more dataflow than others - architectures in the Polymorphous Computing Architectures program).

    2. Re:You've all got the wrong idea by rufty_tufty · · Score: 1

      This is new?
      http://www.ece.cmu.edu/research/piperench/index.ht ml (the oldest working link I can find on this area) is about 6 years old. I did have links that were older(they just don't work anymore)...

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    3. Re:You've all got the wrong idea by aminorex · · Score: 1

      Dataflow is old news, but there were some fuzzy words in the article which seemed to imply that they were doing some sort of lazy partial evaluation in hardware. That seems like an interesting idea, and one generally applicable to any ISA: Imagine that your compiler could mark the interesting output registers for a basic block, and then the chip could optimize away all of the
      side-effects! The power savings alone would be enormous, plus you could fill pipelines with just those ops which were actually useful.

      --
      -I like my women like I like my tea: green-
  35. I'd really like to know... by vought · · Score: 3, Funny

    Is the guy who runs this machine named Captain Trips?

  36. Old technology on hyperdrive by dascandy · · Score: 0, Redundant

    Is it just me or does the article explain '95 technology?

    It tells about loading blocks of instructions at a time (say, a cache line), then executing them whenever the data is available (which is called out-of-order execution).

    In other words, they're going to overclock a pentium-I to 10ghz and add an excess in pipelines to make it reach a teraflop. I could've done that (given the p1 design).

  37. Re:Boring (article, not project) by Criffer · · Score: 3, Insightful

    Really it's a glorified DSP that has some interesting programmability

    Actually, it sounds more like an FPGA. And, since VHDL is turing-equivalent, it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.
  38. better to take a look at.... by kwikrick · · Score: 3, Informative

    the homepage for the TRIPS project: http://www.cs.utexas.edu/users/cart/trips/ because the article doesn't do a good job at explaining the idea, which I think is very interesting. It's not mere branch prediction these people are talking about, and it's more than dumb parallel processing. They are basically fragmenting programs into small dataflow networks.

    --
    assignment != equality != identity
  39. A chance for pure functional languages to shine. by master_p · · Score: 2, Interesting

    Pure functional programming languages will see a tremendous boost from architectures like Trips. In functional programming languages, variables are never assigned, thus making it possible for all parts of an expression to be executed simultaneously. With 128 instructions, it is possible that lots of algorithms that take lots of time when executed sequentially, will take constant time with this new architecture: matrix operations, quicksort, etc.

  40. Re:Boring (article, not project) by swarsron · · Score: 1
    What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple.

    So why is this on slashdot?
  41. Re:Boring (article, not project) by ThinkingInBinary · · Score: 1

    First, I didn't RTFA yet, but I'm just wondering aloud...

    I seem to remember in their presentation showing a prototype doing software-radio at a data rate usable for 802.11.

    So, eventually, would it be possible to build an 802.11[abg] software-defined radio that can sniff (using Kismet of course) all 11 802.11[bg] channels and all however-many-there-are 802.11a channels, all at once? You could scan for all AP's in about 1/10 of a second! If software-defined radio hardware (e.g. receivers, DSP chips, maybe even this chip) become cheap, will we have AP's that communicate on 11 channels at once? (Yeah I know they have MIMO but that's only 3 channels, IIRC, and it doesn't use an uber-l33t software defined radio chip!)

  42. Mozart/Oz candidate by Analogy+Man · · Score: 1

    I wonder if this would handle concurrent programming and constraint-based inference Mozart far better than existing chip architectures.

    --
    When the people fear their government, there is tyranny; when the government fears the people, there is liberty.
  43. Re:Boring (article, not project) by Anonymous Coward · · Score: 0

    Actually, it sounds more like an FPGA.

    Err, FPGA and DSP describe different things. AAMOF, their DSP could be implemented using a FPGA, so your statement is a bit confusing. DSP merely describes the functionality of a part, FPGA is a specific type of part.

  44. All you need to know about compiler theory by CaptainFork · · Score: 1
    I'm not sufficiently strong on compiler theory...

    All you need to know is that there are some people in this world called compiler weeinies. They had over-protective mothers and spend their lives looking for the same kind of protection in the adult world. They worship compilers because compilers protect them from the scary realities of computer architecture and keep them nice and safe in the world of high-level languages.

    Compiler weenies believe, above all else, that a good compiler can always do a better job than any human at optimisation. They are in denial of the following facts:

    - Languages usually do not specify important info that may be used for optimisation, eg typical values of inputs

    - Algorithm cannot be separated from architecture. A vector machine might do DFTs faster than FFTs. No compiler can turn FFT into DFT.

  45. An easier to program Itanic-workalike? by UnapprovedThought · · Score: 2, Interesting

    What this is *not* in any form is a general purpose CPU.

    The article doesn't seem to agree:

    One of the big challenges to becoming a mainstream commercial processor is compatibility with existing software and systems, especially x86 compatibility, Moore says. But one way to maintain compatibility would be to use Trips as a co-processor, he says. "The general-purpose [x86] processor could offload heavy tasks onto the co-processor while still handling legacy compatibility on its own."

    So, it looks like they're trying to get Intel or AMD interested in producing a heterogeneous multi-core unit that includes their trippy core, in the hopes of keeping the number of cores (and their communications overhead) down to a minimum. Intel already has a form of (so-called) instruction-level parallelism with the Itanic, and it didn't work out too well (except maybe for crypto-heavy workloads). It's possible AMD will be mulling it over. One of the things they will have to worry about is whether a compiler can actually be written to use it, FTA:

    ... the Trips compiler sends executable code to the hardware in blocks of up to 128 instructions.

    With 128 instructions to schedule at once, that might provide a chance to actually keep all of the processing units on the chip busy. With the Itanic, it was really a challenge to do that, since you had to pull two floating point instructions out from somewhere in every clock cycle, something that not all workloads could accomplish, and I can see the compiler writers going crazy trying to produce some sorts of ultimately self-defeating hacks trying get that accomplished :)

  46. Dataflow is Non-algorithmic by MOBE2001 · · Score: 1

    So why aren't dataflow machines mainstream?

    The reason is that dataflow is really a non-algorithmic, signal-based approach to computing whereas most programming languages and applications are strictly algorithmic. We need to change our way of programming in a radical way before the non-algorithmic model can take off. It's not easy to translate algorithmic code into a dataflow application.

    In my opinion, even though TRIPS has 'reliability' in its acronym, unless the execution of parallel code in a given object is synchronous, there is no way it can enforce reliability. To get an idea as to how a signal-based synchronous architecture can enforce software reliability, see the link below.

  47. yes, lots in the theory area by rebelcool · · Score: 1

    There was a somewhat famous CS person named Djikstra who taught there for years. Perhaps you've heard of him.

    He set the tone for UT's best known research for years - theory. They've also got a couple of well known robotics labs (not as well funded as CMU, but they're more focused on improving the software brains than building big flashy machines to crash around in a desert)

    Beyond CS undergrad (which is UT's second largest major, behind Biology - and UT is the highest populated university in the USA), UT's got a good grad program.

    I had a class with Doug Berger. Great guy. Brilliant, too.

    Just because you havent heard of something doesnt mean it doesnt exist. Most work universities do doesn't get published on slashdot - it goes into research journals and conferences that I'm sure you don't read or attend.

    --

    -

  48. there is no lack of release software by rebelcool · · Score: 1

    you just have a lack of knowledge about it.

    the UT applied research lab has developed the basis technology behind pretty much every US military sonar system in use since WWII. Ditto with a number of satellite and other techs (mostly defense related, but all that trickles down into mainstream usage). ARL is a combination of CS, ME, EE and other engineering fields.

    Numerous search engine technologies and the closely related 'recommendation' systems that places like amazon uses have been born and bred...

    UT does mostly foundational software and research work, which is acquired and built upon by others.

    --

    -

  49. Its not meant to be general purpose by BigMFC · · Score: 1

    TRIPS definitely doesn't look like it was targeted for a desktop, more for DSP like apps requiring high throughput and a constant data input stream. They mention this in the article, (Software defined radio, co-processors for actual general purpose processors). So an architecture like this may be competitive w/ something like Cell or Imagine or maybe TI's high end dsps, but not with a single core processor targeted for business apps and the like on x86 platforms. And part of the reason why TRIPS is a good design is that the compiler guys and the hardware guys are part of the same group and probably sat down and hammered out an ISA that allows for maximum extraction of parallelism by the compiler. Btw the main reason why we're still using x86 is economics. It would require not just a better design to get companies to suddenly move on and abandon their legacy stuff, it would require something revolutionary with insanely good marketing. The drift to RISC type ISAs is happening... just very very slowly (I believe both AMD and intel convert x86 CISC type instructions into RISC-like uOPs which are then executed no?)

  50. Is this different from superscalar architectures? by Anonymous Coward · · Score: 0

    It seems to me that they only took superscalar architecture description from Hennessy & Patterson, description of Tomasulo algorithm, and added bit more of everything . And yes, they hope that they will have enough instruction level parallelism to utilize all functional units they have.

  51. LabVIEW, by National Instruments, of Austin, TX by mosel-saar-ruwer · · Score: 1

    Does any major piece of software that folks use come from UT? I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell ... But I can't think of a single one from UT.

    National Instruments, of Austin, TX, sells a graphical programming language, called LabVIEW, which has about a 90% market share in the research sector [both for-profit and "not-for-profit"], and which is moving aggressively into the automation sector [i.e. the factory floor].

    PS: Ironically, LabVIEW 8.0 was just announced yesterday.

    PPS: Unlike many of their competitors [e.g. Agilent], National Instruments weathered the dot-com/dot-bomb tech debacle pretty well:

    http://finance.yahoo.com/q/bc?s=NATI&t=my&l=off


  52. National Instruments -vs- Agilent by mosel-saar-ruwer · · Score: 1

    Unlike many of their competitors [e.g. Agilent], National Instruments weathered the dot-com/dot-bomb tech debacle pretty well...

    Here's a better graphic of what I was talking about:

    National Instruments -vs- Agilent
    Or this:
    National Instruments -vs- Teradyne
  53. programmers, wait a while by fikx · · Score: 1

    For all those who program (except maybe for those who work in assembly) this will not impact you directly for a while. This is at the machine level and it's a different kind of architecture. It may need new complilers or even new languages. But, it's not there yet. Just because you program everyday doesn't mean this will apply to you. This is mainly for the folks are interested in more than just how to make loops in code.
    So, if all you do is program computers, wait for the trickle down before claiming it's good or bad...

    --
    AB HOC POSSUM VIDERE DOMUM TUUM
  54. Lisp Machine? by Doctor+Faustus · · Score: 1

    If each instruction executes when its inputs are available, rather than in any specified order, and passes outputs to the next instruction, rather than to a specific register, it seems like such a system would be best for function programming. Is there any truth to that?

  55. Academia by Erich · · Score: 1
    This is like a host of other academic projects. They all start out with the premise "Suppose I have this grid of CPUs/ALUs/whatever". Then they use an army of grad students to hand code for the grid. You get some interesting SPEC results, publish some papers, and get more research money. This is not new, this has been the case for a long, long time.

    But often the ideas don't pan out in real life. With TRIPS, you get inflated IPC results from inflated instruction counts from huge superblock schedules. The TRIPS compiler (last I saw) was not suitable for real life applications. The fact that they can fab things like TRIPS chips and boards only shows that we have so many transistors on a chip these days, any crazy-ass idea you have can be produced.

    With modern out-of-order pipelines, you get instruction issues in dataflow order. You have to be very careful when trying to encode dataflow in the instruction set. If you obey it, you can have problems when instructions don't have the latency you expect (like cache misses).

    If you want my opinion on where the interesting ideas that will get used in the future are, look at what people are going to do with PS3/Cell. Look at languages and operating environments that facilitate parallel structuring of code. Once you get the program into independent threads of execution with low synchronization, the amount of independant instructions you can present to the machine skyrockets.

    I dunno, maybe I'm too critical. The guys at UT Austin are doing interesting things, and getting interesting results. If nobody was doing research, nobody would come up with the clever ideas we need in the industry. Work on the compiler, especially, has usefulness outside of a TRIPS ALU grid. I just think this "grid of ALUs" idea is getting old, and since UT and UW like these ideas, it's most of what you see in ISCA and MICRO.

    On the other hand, I guess if you have a billion transistors, why not throw a big grid of ALUs in there?

    I guess we'll see what pans out in the industry in the upcoming years. I'd place my money on more threads/CPUs, and not so much in the "sea of ALUs" approach. But I know the company I left last year was thinking of this kind of idea in a real product. If you can make it work for real, you can make some bucks. If not... well you can publish some papers I guess. :-)

    --

    -- Erich

    Slashdot reader since 1997

  56. Doug Berger is an excellent teacher! by santiago · · Score: 1

    If you have a chance to take his Microprocessor Architecture class, do so. He rocks! We had people from the other professors' sections sneaking into ours to actually learn the material.

    1. Re:Doug Berger is an excellent teacher! by bushk · · Score: 1

      it's funny that a student of his could misspell his name. ;)
      s/berger/burger/g

    2. Re:Doug Berger is an excellent teacher! by santiago · · Score: 1

      Doh! It's been five years, and I just copied the spelling from the Slashdot article. Plus, my name has a "Berg" in it, so it looks like a perfectly fine syllable to me. (And, just to add to the confusion, my wife's has a "borg" in it.)

  57. A vastly better site for information by jd · · Score: 2, Informative

    The TRIPS homepage has nine published papers on how this design will work and a schematic diagram of what they're expecting the design to end up looking like. They are also promising simulators and compilers later this year.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  58. Re:Boring (article, not project) by Nick+Nethercote · · Score: 1
    What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple. This is for doing stream data processing such as compression or HPC simulations.


    Wrong. It is entirely general purpose. Although the prototype won't boot Linux -- the prototype TRIPS boards will run under the control of an embedded Linux on a PPC machine -- that is only being done to simplify the design of the prototype chip and so no OS has to be ported to it to make the thing work.
  59. What A Long Strange Trip It's Been by Anonymous Coward · · Score: 0

    ... Captain Trips, my first thought as well.

    RIP Jerome John Garcia 1942-1995

    Thanks for the Memories.

  60. Re:Is this different from superscalar architecture by bushk · · Score: 1

    hardly.

    superscaler architectures are dynamic placement, dynamic issue: making it the hardware's responsibility to figure out both WHERE and WHEN to fire an instruction. this carries tremendous control and logic overhead.

    TRIPS is a static placement, dynamic issue architecture: thus the compiler (or assembly language programmer) decides WHERE (IE: which ALU) to place an instruction, and the instructions fire dynamically - in this case, when all of it's inputs have arived.

  61. Where Dataflow works and where it dosen't by Anonymous Coward · · Score: 0
    Dataflow hardware has been around a long time, and it does not have a good track record. There was a dataflow machine research project at Manchester University in England from 1976 to 1995, but it is no longer active, at least according to this web page http://www.cs.manchester.ac.uk/cnc/projects/datafl ow.html

    There have been many research machines, but no successful commercial products. Data flow techniques seem have had their greatest impact in two areas: compiler optimization and instruction scheduling inside the CPU. Many optimizations use SSA, or static single assignment. SSA means that any variable is only assigned a value once. Converting to SSA means that the code can be represented as a Directed Acyclic Graph (DAG), and this is useful for code generation. Dataflow is also implemented in hardware to enable parallelism and features like speculative execution in the pipeline.

    Experience has shown that there is only so much parallelism that can usefully be exploited using either compiler or hardware based dataflow based techniques. This is not a good sign for this project, unless they are targeting primarily very parallel applications, for example DSP algorithms or image processing. Even so, other research groups have tried this and failed (or at least not succeeded). One is the RAW architecture at MIT: http://cag-www.lcs.mit.edu/raw/ Another example is iWarp, a CMU/Intel systolic processor. RAW is currently active, iWarp is over.

  62. Correct that... by corngrower · · Score: 1

    make that 780 million ARM processors, with 80% of cell phones using them. I belive they're on their way to 1 billion this year.

  63. But then again... by Anonymous Coward · · Score: 0

    In communist russia, CPU uses Users to compute in parrallel.

  64. dumbass by Anonymous Coward · · Score: 0

    Oh look! Another asshat attempting to googlebomb someone via their sig. /. adds 'rel="nofollow"' to links in .sigs, making your feeble attempt completely pointless.

    <RedForman>DUMBASS!</RedForman>

    --
    Anonymous Coward - Educating dumbass Slashdotters since 2005.

  65. Re:Boring (article, not project) by Alsee · · Score: 1

    it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.

    That fact a lot like the the fact that Saturn would float if you dropped it in the ocean. Both are technically true (Saturn is mostly hydrogen and helium and really is less dense than water), but Linux will no more fit in any existing FPGA than Saturn will fit in any existing ocean. Chuckle.

    -

    --
    - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  66. Re:Boring (article, not project) by Anonymous Coward · · Score: 0

    If you compile Gentoo Linux with full space optimizations, you can fit it onto a four-function solar-powered calculator, with room left over for Tetris and Minesweeper.

  67. Dataflow's been around for a while by some+guy+I+know · · Score: 1
    about 6 years old
    I remember reading about dataflow architectures when I was in college in the 1970s.
    See also this post.
    --
    Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana