Next-Gen Processor Unveiled

I want one by Normal+Dan · 2007-04-24 08:36 · Score: 4, Insightful

But when are they likely to be ready?

--
A unique way to learn a language: http://languageloom.com

Re:I want one by ackthpt · 2007-04-24 08:51 · Score: 5, Funny
But when are they likely to be ready?
- You know they'll be ready when Intel places large orders for aluminium for heatsinks.
- You know they'll be ready when there's a sudden drop in prices of the current Hot CPUs, which are all proven but suddenly look like last month's pizza from under the couch.
- You know they'll be ready when AMD hasn't said anything and they are suddenly shipping them, while Intel tells you in 9 mos. then suddenly says 3 mos. (and you can hear the whips cracking through the walls.)
- You know they'll be ready when Microsoft doesn't have an operating system ready, but there are a dozen Linux distros good to go.
--

A feeling of having made the same mistake before: Deja Foobar
Re:I want one by Anonymous Coward · 2007-04-24 09:08 · Score: 0

So, now...?
Re:I want one by quiahuitl · 2007-04-24 09:51 · Score: 1

So, now we need faster hard disks!
Re:I want one by mrbluze · 2007-04-24 10:29 · Score: 4, Funny

Well i won't buy one until the Super version comes out (STRIPS). Now that's a name that has appeal!

--
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Re:I want one by Anonymous Coward · 2007-04-24 11:00 · Score: 0

No, that' sequential - it's more like:

But when are they likely to be ready? But when are they likely to be ready? But when are they likely to be ready?
Re:I want one by Anonymous Coward · 2007-04-24 11:04 · Score: 2, Funny

Hsss, he ruins it! The fat hobbit ruins it!
Re:I want one by Heembo · 2007-04-24 20:30 · Score: 1

I have an idea, let come up with non-volatile computer memory that can be electrically erased and reprogrammed without the need for motors or platers?

--
Horns are really just a broken halo.
Re:I want one by badspyro · 2007-04-24 21:13 · Score: 1

realisticaly, we need a faster BUS before we need faster HDDs
Re:I want one by Anonymous Coward · 2007-04-25 02:02 · Score: 0

# You know they'll be ready when there's a sudden drop in prices of the current Hot CPUs, which are all proven but suddenly look like last month's pizza from under the couch.

http://www.tgdaily.com/content/view/31743/118/

Hm... by imsabbel · 2007-04-24 08:37 · Score: 4, Insightful

The article contains little more information than the blurb.
But it seems to me that we called this great new invention "vector processors" 15 years ago, and there is a reason they arent around anymore.
"Many instructions in flight"=="huge pipeline flushes on context switches"+"huge branching penalities" anybody?

--
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?

Re:Hm... by superpulpsicle · 2007-04-24 08:42 · Score: 4, Interesting

Come on now. It's a capitalist market. You can't just innovate your way to fame. Just like the list of 5 million other patents that never see the daylight.
Re:Hm... by volsung · 2007-04-24 08:42 · Score: 5, Informative

The vector processors never went away. They just became your graphics card: 128 floating point units at your command

BTW, here is a real article on TRIPS.
Re:Hm... by Anonymous Coward · 2007-04-24 08:48 · Score: 3, Informative

Actually, it is more like the dataflow architectures from the 70s. Vector processors are a totally different kind of thing (SIMD).

The idea is simple, instead of discovering instruction level parallelism by checking the dependencies and anti-dependencies by global names (registers), define the dependencies directly by relating to instructions themselves.

> "Many instructions in flight"=="huge pipeline flushes on context switches"+"huge branching penalities" anybody?

That equality does not exist. It is a wide parallel execution, not super-pipelining, ergo no huge branching penalties.
Also, the architecture is more likely exploting the wide execution unit by predicating both branches and calculating them both.
Re:Hm... by MindStalker · 2007-04-24 09:03 · Score: 1

Also with the move towards multi-core there is the potential that special task could stay on one processor for their entire length. This would renew reason to look at vector processing for main CPU usage.
Re:Hm... by naoursla · 2007-04-24 10:16 · Score: 1

I only have a passing familiarity with TRIPS but I think that one of the goals was to get rid of the huge costs for pipeline flushes.

Doug Burger is one of the main PI's on this project (which is around seven years old at this point). I'm sure you can find more information there if you are interested.
Re:Hm... by flaming-opus · 2007-04-24 10:18 · Score: 1

I'd say it looks more like VLIW or EPIC. Instructions grouped into blocks with no looping or data dependence, running in parallel. It looks like a 16-wide Itanium, with all the same problems actually generating code that will run on it very well.
Re:Hm... by frank_adrian314159 · 2007-04-24 11:11 · Score: 5, Informative

No, here are the real articles on TRIPS. These and many others can be found here.

--
That is all.
Re:Hm... by DigiShaman · 2007-04-24 12:05 · Score: 1

Vector processing instruction sets are also implemented in your modern CPU. They're known as MMX, SSE, SSE2, SSE3, SSE4, 3D Now! and so on.

--
Life is not for the lazy.
Re:Hm... by Steendor · 2007-04-24 12:48 · Score: 1

...exploting the wide execution unit by predicating both branches and calculating them both. So what happens when you have more possible branches than you have execution units?
Re:Hm... by rbanffy · 2007-04-24 13:07 · Score: 1

You can get away with little to no penalties on context switches by having a context-file around. This way you keep the top most used contexts on chip and only hit memory when you swap a context to/from memory - and even that can have reduced impact if it happens while you are running other on-chip contexts.

You could also avoid some switching by keeping micro-contexts - separating the context of the various units and letting the software deal with them independently. This way, if you only have use for 5 processing units in your thread, only five of them will be swapped out when the next thread has to run.

Or not. Next thing to do is to read the fine article. I got curious.

--
http://www.dieblinkenlights.com
Re:Hm... by volsung · 2007-04-24 13:07 · Score: 1

Yes, but operating on 2 doubles or 4 floats at a time is only barely "vector" processing. Hopefully the merger of ATI and AMD will bring some wider vector units closer to the CPU.
Re:Hm... by ConceptJunkie · 2007-04-24 13:21 · Score: 1

So what happens when you have more possible branches than you have execution units?

Silly, you run over to MicroCenter buy more hardware.

--
You are in a maze of twisty little passages, all alike.
Re:Hm... by Jeff+DeMaagd · 2007-04-24 13:30 · Score: 1

Thank you for telling me that Earth Simulator went away 15 years ago!
Re:Hm... by briancnorton · 2007-04-24 13:58 · Score: 1

Perhaps the "next-gen" is specialized coprocessors?

--
People who think they know everything really piss off those of us that actually do.
Re:Hm... by 3rd_Floo · 2007-04-24 13:59 · Score: 1

No, as TFA pointed out, this is Texas, you go to Frys.
Re:Hm... by Anonymous Coward · 2007-04-24 14:44 · Score: 0

http://www.w3.org/TR/WCAG10-HTML-TECHS/#link-text
Re:Hm... by SiJockey · 2007-04-24 15:47 · Score: 4, Informative

The big difference in TRIPS is that stuff flying around out in memory can be squashed easily. The machine has aggressive branch prediction, efficient predication support in the ISA, and data dependence prediction. So, the 1024 instructions don't need to be long vectors streaming from memory. Squashing a mispredicted branch and restarting down the right path takes on the order of 10-20 machine cycles. Thanks for your comments and interest. -DB

--
--+-- Doug Burger, UT-Austin Computer Sciences
Re:Hm... by David+Greene · 2007-04-24 16:09 · Score: 1

Doug was one of the students who worked on the MultiScalar project at Wisconsin over a decade ago. Lots of good research out of that project and TRIPS bears more than a passing resemblance to MultiScalar. But, of course, it's different. The compilation model is much simpler and in the interim between the projects there have been a number of advances in compiler/architecture cooperation.

TRIPS is definitely not a vector processor. As another poster said, it's close to a dataflow machine, but it also has some of the characteristics of fine-grained multithreading. I'd place it in the spectrum as an architecture engineered to be as close to dataflow as possible while still being practical.

Back when I read the papers, when they had simulator results, it struck me that the individual scalar processors were pretty aggressive, very out-of-order, etc. Complex and difficult to design. I am very curious about what they were able to do in actual silicon. Looks like it's time to read some more papers. :)

--
Re:Hm... by Anonymous Coward · 2007-04-24 19:06 · Score: 0

Damn bro, did you already get one? Cause you're TRIP'in :P

j/k...
Re:Hm... by MORB · 2007-04-24 21:27 · Score: 1

There are detailed informations available, including the isa of their prototype.

http://www.cs.utexas.edu/~trips/publications.html

Instructions are grouped but WITH data dependencies, that are explicitly encoded in the instruction stream, which means that generating code running well on that thing doesn't sound that difficult. IANA compiler expert but I think a compiler able to generate good code for this out of regular, scalar code sounds quite plausible.
Re:Hm... by Hieronymus+Howard · 2007-04-24 23:21 · Score: 1

"But it seems to me that we called this great new invention "vector processors" 15 years ago, and there is a reason they arent around anymore."

I'm willing to bet that you typed that on a machine with a vector processor. What happened is that they became integrated into general purpose CPUs. The Altivec unit in my Mac's Power PC chip is a vector processor, as is the SSE unit in Intel CPUs.
Re:Hm... by sraasch · 2007-04-25 02:40 · Score: 1

Doug!

Congratulations on getting silicon! (And making /.)

-Steve
Re:Hm... by Tower · 2007-04-25 05:01 · Score: 1

Or, for example, the Cell, which (depending on the data size) can operate on 128 bit vectors (16 8-bit integers, 8 16-bit integers, 4 32-bit integers, or 4 single precision floating-point numbers). And yes, it is multi-core as well, with 8 SPEs on a single ship.

--
"It's tough to be bilingual when you get hit in the head."

Next-Gen Business Model Unveiled by Anonymous Coward · 2007-04-24 08:38 · Score: 3, Insightful

1. Copy some university press release to your blog
2. Make sure google ads show up at the top of the page
3. Submit blog to slashdot
4. Profit

Re:Next-Gen Business Model Unveiled by Anonymous Coward · 2007-04-24 08:42 · Score: 0

good idea, I'm putting that on my blog
Re:Next-Gen Business Model Unveiled by richdun · 2007-04-24 08:44 · Score: 1

Actually, to be next-gen, you need to add flashy graphics or random Ajax to your blog as well, since that is already the current-gen business model.
Re:Next-Gen Business Model Unveiled by Anonymous Coward · 2007-04-24 09:03 · Score: 0

i thought there were only 3 steps to profit :| .. or was that pre-web 2.0 ?
Re:Next-Gen Business Model Unveiled by Manos_Of_Fate · 2007-04-24 09:09 · Score: 2, Funny

You forgot the embedded YouTube video...

--
Isn't enough that I ruined a pony, making a gift for you?
Re:Next-Gen Business Model Unveiled by richdun · 2007-04-24 09:47 · Score: 1

Drat! I AM TEH L0S3R!

Ew... Where did that come from? I need to get out more.
Re:Next-Gen Business Model Unveiled by p!ssa · 2007-04-24 12:10 · Score: 1

And rounded corners!

Yes but can it run by Timesprout · 2007-04-24 08:38 · Score: 0, Redundant

Vista?

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe

Re:Yes but can it run by Anonymous Coward · 2007-04-24 08:43 · Score: 0

Yes, but it's still a little sluggish.

Marketting hype? by faragon · 2007-04-24 08:40 · Score: 5, Informative

Each TRIPS chip contains two processing cores, each of which can issue 16 operations per cycle with up to 1,024 instructions in flight simultaneously. Current high-performance processors are typically designed to sustain a maximum execution rate of four operations per cycle.

It's me or are they trying to reparaphrasing, euphemistically, the Out-of-order execution?

Re:Marketting hype? by Aadain2001 · 2007-04-24 09:02 · Score: 4, Informative

Based on the article, "TRIPS" is nothing more than a Out-Of-Order(OOO) SuperScalar based processor. So unless the article is grossly simplifying (possible), this is nothing but a PR stunt. And based on the quote from one of the Professors about building it on "nanoscale" technology (um, been doing that for years now), my vote is pure PR BS.
And as an aside, the reason modern CPUs are designed to "only" issue 4 instructions per cycle instead of 16 is because after years of careful research and testing real work applications, 4 instructions is almost always the maximum number of instructions any program can concurrently issue, due to issues like branches, cache-misses, data dependencies, etc. Makes me question just how much these "professors" really know.

--
Space for rent, inquire within
Re:Marketting hype? by Doches · 2007-04-24 09:17 · Score: 3, Interesting

Branches are no problem for TRIPS -- in the EDGE architecture, both execution paths resulting from a branch are computed, unlike in classic architectures where the processor blocks (8086), skips ahead a single instruction before blocking (MIPS), or chooses a path using a branch predictor and executing it, possibly only to discard all instructions issued since the branch, if the predictor turns out wrong. EDGE architectures still lag on cache misses (or any memory hit) -- but that's fundamentally a problem with memory, not with the processor. Don't read the article, read the UT pdf.
Re:Marketting hype? by Randolpho · 2007-04-24 09:35 · Score: 1

hmmmmm.... sounds almost exactly like IA64, something Intel has had since the turn of the century.

--
"Times have not become more violent. They have just become more televised."
-Marilyn Manson
Re:Marketting hype? by Wesley+Felter · 2007-04-24 09:46 · Score: 1

Yes, TRIPS is an out-of-order superscalar processor. But it's bigger and better: by eliminating centralized structures, a TRIPS core can issue more instructions per cycle out of a bigger instruction window. It's not just more of the same; it's a qualitative improvement that allows much bigger (and thus higher performance) cores to built, yet with lower power and design costs.
Re:Marketting hype? by faragon · 2007-04-24 10:01 · Score: 1

Each of the two processor cores can execute up to 16 out-of-order operations
vs
Current high-performance processors are typically designed to sustain a maximum execution rate of four operations per cycle.

They are comparing oranges against apples (!), as you can not compare 16 OoO executed instructions per cycle versus 4 *resulting in-order* instructions per cycle (where for achieving these 4 instructions/cycle may be you had to execute 10, 20 or more OoO instructions (!)). Please, where is the rigor? Fair play anyone?
Re:Marketting hype? by Anonymous Coward · 2007-04-24 10:21 · Score: 0

I took Burger's computer architecture class back in the day (read: 4 years ago) and I discussed the TRIPS processor as well as MRAM with him extensively. The guy knows what he's talking about. Now as to whether or not the TRIPS will be some big revolution is for the market to decide. There are some really good ideas coming from some really smart people over there. Sadly, that doesn't necessarily mean that the good ideas will come to fruition.
Re:Marketting hype? by ghoul · 2007-04-24 10:24 · Score: 1

Actually the TRIPS is somewhere between OOO and VLIW (Itanium) The explicit data flow information embedded in the instructionenables much larger instruction windows than possible in traditional OOO. As instructions are not just loaded into the window in a dumb manner it reduces the chances and costs of pipeline flushes

--
**Life is too short to be serious**
Re:Marketting hype? by arktemplar · 2007-04-24 10:29 · Score: 1

Its possibly is market hype, Space on silicon is extremely costly, by doing somthing like pipelining and parallelism they would be wasting precious silicon real estate, I think that the Cell processor (an no I am not an IBM fanboi), is closer to what you should expect from even the GPU's of tomorrow, and personally, I think (hope, even though I am a VLSI designer) that silicon would be obsolete in a couple of decades or so.

--
blog plug -> The Darker Side of Light
Re:Marketting hype? by renoX · 2007-04-24 10:44 · Score: 1

No in the IA64, like in the VLIW, the instructions are scheduled by the compiler (which works well on very regular code, poorly everywhere else) whereas in a TRIPS, on each execution unit the resources are dynamically used.

From the paper "Scaling to the End of Silicon with EDGE Architectures", TRIPS ISAs are hardware dependant though, which means that you'd have to recompile your applications each time you use a new CPU, if I understood correctly, this is a significant problem (that and the memory wall).
Re:Marketting hype? by baggins2001 · 2007-04-24 10:53 · Score: 1

And as an aside, the reason modern CPUs are designed to "only" issue 4 instructions per cycle instead of 16 is because after years of careful research and testing real work applications, 4 instructions is almost always the maximum number of instructions any program can concurrently issue, due to issues like branches, cache-misses, data dependencies, etc. Makes me question just how much these "professors" really know.
This processor was designed for parallel processing. It's intent is to be used in a large super computer that they are building in Austin. So the programs will be designed to take advantage of the CPU capability. This is not a chip to replace a Pentium or AMD processor running MS Word. This is a chip that will be used in a huge parallel computer system replacing AMD or Intel quad core chips, which they are currently waiting on.

They got the funding by saying it could be used to design game plans that would defeat Oklahoma into the next century. Unless of course Oklahoma gets a really big super computer.

--
He who said 1,000,000 monkeys on 1,000,000 typewriters would eventually type the great novel, never saw an AOL chat room
Re:Marketting hype? by smallfries · 2007-04-24 11:12 · Score: 5, Interesting

Multiway branching is ancient, and it's not used much because it's very inefficient. At least half of the instruction stream after a branch will be canceled, two branches deep it is 75% and so on. No matter how much parallelism ou throw at this there are only marginal gains to made (exponential increase in number of execution units for a linear increase in depth). It still doesn't get around data dependencies which will be the major bottleneck if looking that far ahead in the instruction stream.

Having read the articles that were easy to get to, and the abstract of the PhD student: this is buzzword bollocks. There is no innovation in what they have done. As other people have pointed out this is a vector / datastream architecture. It's not a very good one at that. Although it has the "potential" to scale to terraflops, so does my toothbrush. On a 130 process they can fit 2 cores with 32-wide dispatch clocked at 500Mhz. My 7800 is fab'ed on a 130 process with 24*4*4 = 384 operation wide vector dispatch. This prototype would hit about 16 billion ops/sec, versus 180 Gflop on the 7800. This is a long way from terraflops, and doesn't convince me that it can scale.

As the 7800 is close to a systolic model there is a limited class of programs that can be executed; but those that are in that class exhibit (near)perfect parallelism and so have zero hit from memory access costs. Actually the internal bandwidth on the 7800 is a bottleneck for some computations but I'm just going for coarse detail here.

Edge appears mix and match ideas from several parallel designs; every one of which suffers from hard code generation problems. I suspect that the only sample applications that hit 32 ops / cycle are media apps (or dataflow problems as they used to be called) which normal architectures run at high speed anyway.

Interesting research, as it's always good to see people explore different designs, but it sounds overhyped and I believe that it has zero commercial appeal. Finally, as a sidenote, you are right about cache latencies being a memory defect rather than processor but there are ways around it. If you are willing to limit yourself to a certain class of applications (roughly the same one that executes well on most parallel architectures such as this, or GPUs) then you can completely avoid the latency. This provides a much bigger performance hike than any other technique as memory latency is a dominating factor in most runtimes. The only snag is that it is very hard to do, requires different fabrication technology (largely solved now), and lots of compiler advances... If you're interested then google for intelligent ram. It's about a decade of research now...

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Marketting hype? by tantaliz3 · 2007-04-24 12:12 · Score: 1

...work applications, 4 instructions is almost always the maximum number of instructions any program can concurrently issue, ... Ya, and NO ONE will ever need more than 124kb of Memory. Phssh
Re:Marketting hype? by SiJockey · 2007-04-24 15:56 · Score: 4, Informative

Actually, there is much more parallelism (more than 4 ops/cycle) available in many of these applications, but you correctly observe that many of these ancillary features (branch mispredictions, cache misses, etc.) chip away at the achieved parallelism. The TRIPS ISA and microarchitecture (which is, as you correctly point out, a variant of an OOO "superscalar" processor) has numerous features to try to mitigate many of these features ... up to 64 outstanding cache misses from the 1,024-entry window, aggressive predication to eliminate many branches, a memory dependence predictor, and direct ALU-ALU communication for making data dependences more efficient. The most important difference is in the ISA, which allows the compiler to express dataflow graphs to directly to the hardware, which will work best (compared to convention) in ultra-small technologies where the wires are quite slow. To get a similar dependence graph in a RISC or CISC ISA, a superscalar processor must reconstruct it on the fly, instruction by instruction, using register renaming and issue window tag broadcasting. Thanks for reading.

--
--+-- Doug Burger, UT-Austin Computer Sciences
Re:Marketting hype? by David+Greene · 2007-04-24 16:17 · Score: 1

Which brings up a good question: what are they counting as ILP? It's been a while since I read the papers so I'll have to go and double-check, but if they're counting predicated instructions that get squashed in their "instructions simultaneously executing" number, it's not a fair comparison to current processors which report IPC as actually instructions that do work.

That said, IIRC TRIPS uses some interesting cache technology to help with speculation and relies on the compiler much more than a standard OOO architecture. If you're running the right kind of codes, there's a rather large amount of parallelism out there to exploit. Even if you're NOT running the right kinds of codes, you can still get a surprising amount of parallelism by removing the stack pointer if your branch prediction (or predication) and latency tolerance is good enough. I don't recall whether TRIPS makes use of this available parallelism or not.

--
Re:Marketting hype? by David+Greene · 2007-04-24 16:26 · Score: 4, Interesting

...this is buzzword bollocks.

No, it isn't. The TRIPS group has done some really interesting things with compilers, for example. They've managed to have the compiler break up code into packets and schedule them on the processor array so that dependencies flow nicely across the grid. That is not an easy problem to tackle. This is very good research.

I believe that it has zero commercial appeal.

That's not the point of research. The point of research is to explore problems no one has tackled before, of course always with an eye toward future technology trends.

--
Re:Marketting hype? by Anonymous Coward · 2007-04-24 16:32 · Score: 0

Actually, if you understand what a Data Flow processor is, your would understand that the problem of cache misses, data dependencies, and branches are all but eliminated. In a data flow processor, your program executes as if each instruction is a thread. Since the processor can fork and join threads in just two instructions, and do context switches every instruction at almost no cost, you can hide latencies with enough inflight "Threads". I am actually researching a processor that tries to mimic the data-flow structure in an intel-like processor. It still has overhead of switching threads, but it is orders of magnitudes lower than what it takes to do it in software. And do you know what is really holding back the Von-Neumann style processors? It is the stack. With data-flow you can executed every function in your program in parallel if there were no data dependencies because you can eliminate the stack.

The real problem with designing a processor like this is not the hardware. It is extremely easy to make a fast data-flow processor. The trick is designing software. The x86 architecture sucks, but why is it used. Because we have tons of resources invested in the software that runs on these processors. Our desktop CPUs are hampered by the fact that they have to support out of date instructions. If we switched to a data flow model, we would have to waste a lot of resources either redesigning software, or if your lucky just recompiling. Programmers would have to rethink programming. What it will really take to get this architecture off the ground will be a full c/c++ compiler.

If you want a good read on how data-flow processors should work, do a search for "Can dataflow subsume". It should be the first article that shows up in google. It outlines what a dataflow architecture could be like.
Re:Marketting hype? by Klintus+Fang · 2007-04-24 17:12 · Score: 1

I wish the article linked to at the top of this discussion contained a discussion in those terms rather than what it said, but that's an aside. ;-)

Much of what you list is already in current processors (aggressive branch prediction, memory dependence predictors, direct ALU-ALU bypasses). What current processors do lack is the deep 1024 out-of-order buffer and the large 64 entry cache miss buffers. But current applications don't really benefit much from going that deep and on the few workloads that theoretically could, a compiler that is very aware of the cache sizes and latencies could probably schedule carefully enough to avoid the need for such a deep OOO buffer.

I do understand though that you are saying that your ISA is designed to work around that. I'm dubious that an ISA can make that kind of difference, but maybe it could.

I am wary of an approach that depends so much on the compiler. This can lead to backward compatibility issues. Users expect programs that ran well on revision X of the processor to run better on revision X+1 of the processer and will be annoyed if they have to rebuy their software because they have upgraded their processor. That doesn't have to be an issue. It would depend upon which market these processors are being sold into (assuming something like them is ever sold). It would probably also depend upon the degree to which the information the compiler is communicating through the ISA (with which I am totally unfamiliar) depends upon the compiler's knowledge of the low level details of the internal hardware and whether it needs to be reconstructed (in your case: recompiled) when the hardware changes, in order to perform well.

That particular point is something that on-the-fly construction of the dependency graph in-hardware deals with very well. It gives you a buffer against poorly written or poorly compiled code (which some users are going to run no matter what you may intend) and lets you deal fairly well performance-wise with legacy code that may be decently written but poorly optimized for new hardware. That is something that future ISA's will have to deal with if they ever intend to replace the ISA's that are currently in widespread use (in my only partially educated opinion).

Don't know if yours does though. Just my thoughts.

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
Re:Marketting hype? by Hal_Porter · 2007-04-24 17:53 · Score: 1

The most important difference is in the ISA, which allows the compiler to express dataflow graphs to directly to the hardware, which will work best (compared to convention) in ultra-small technologies where the wires are quite slow. To get a similar dependence graph in a RISC or CISC ISA, a superscalar processor must reconstruct it on the fly, instruction by instruction, using register renaming and issue window tag broadcasting. Thanks for reading.

That reminds me of EPIC aka Itanium.

It seems like the original RISC idea was to design the ISA so it was easy to build a scalar, pipelined processor. But in a modern superscalar chip, scheduling lots of instructions requires huge amounts of hardware, and that tends to limit clock speed. x86 chips have figured out how to break x86 instructions into something which can be executed efficiently, but both x86 and Risc chips are held back by scheduling instructions.

So EPIC tried to move the scheduling logic back to compile time. Which all sounded like a good idea in theory, but in practice EPIC seemed to be a massive disappointment. I remember reading that mid range x86 outperformed them in SpecInt when both were running native code for example, despite the fact that mid range x86 chips were much cheaper to buy, and presumably to manufacture. Even Risc back in it's prime (e.g. Alpha) had a performance advantage in this sort of test, though nowhere near enough to take over.

So where do you think EPIC went wrong? What does your architecture get right that Intel/HP didn't?

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Re:Marketting hype? by poot_rootbeer · 2007-04-25 01:32 · Score: 1

As the 7800 is close to a systolic model there is a limited class of programs that can be executed; but those that are in that class exhibit (near)perfect parallelism and so have zero hit from memory access costs. Actually the internal bandwidth on the 7800 is a bottleneck for some computations but I'm just going for coarse detail here.

I had no idea Atari's third 8-bit console was so powerful. It's too bad they had to shelve the system when the market crash hit, and never gained back the developer or user base that Nintendo snatched up a couple years later.
Re:Marketting hype? by Quantam · 2007-04-25 06:20 · Score: 1

"Based on the article, "TRIPS" is nothing more than a Out-Of-Order(OOO) SuperScalar based processor. So unless the article is grossly simplifying (possible), this is nothing but a PR stunt."

I'd call it OOE perfected. EDGE allows OOE on scales orders of magnitude larger than current architectures can (and a few other benefits), using less (and less power consuming) hardware. This is accomplished through a rather interesting paradigm inversion (I haven't seen anything like it before, though I don't exactly scour comp-sci journals looking for every pet project of some professor), as windows this large would fry CPUs using current OOE techniques. It's definitely an interesting invention, but we'll see if it's practical enough to ever justify production (for example, it seems to me that the limitations on branches would make the code rather bloated; but it's possible they've already figured out a solution to this, and I'm not aware of it).

--
You have tried to support your argument with faulty reasoning! Go directly to jail; do not pass Go, do not collect $200!
Re:Marketting hype? by smallfries · 2007-04-25 07:33 · Score: 1

Yeah, it's been under appreciated :)

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Marketting hype? by smallfries · 2007-04-25 07:40 · Score: 1

Apart from your selective misquoting you haven't really said anything. From your other posts I would guess that you are (loosely) associated with the project / people on it. Please back up your claim about breaking code up into packets (preferably with a paper citation), because if they have done that I would like to read it.
That's not the point of research

From the way that you misquoted me and then attacked a strawman, either you don't understand what research is about, or you do know but scoring points is more important to you. At no point did I claim that research must be directed at commercial results. As someone who did his PhD in an area that has no (foreseeable) commercial application I don't really need an explanation of this from you.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Marketting hype? by David+Greene · 2007-04-25 08:35 · Score: 1

Apart from your selective misquoting
Well, let's see:

You:

Having read the articles that were easy to get to, and the abstract of the PhD student: this is buzzword bollocks. There is no innovation in what they have done.

Me quoting you:

...this is buzzword bollocks.

Seems a pretty accurate sense of what you said.

Again, you:

Interesting research, as it's always good to see people explore different designs, but it sounds overhyped and I believe that it has zero commercial appeal.

And me quoting you:

I believe that it has zero commercial appeal.

Again, your original sentence concentrated on the perceived commercial value of the project, not the fact that it's a research project. You intended to shift the debate from one of exploring the design space in a relatively unfettered manner to whether the project serves some narrow economic interest. You did the same when comparing TRIPS to existing commercial architectures when running media applications, ignoring the actual future problems that TRIPS is designed to overcome. I simply focused on the argument you actually made.

Are you now unwilling to stand behind that argument? There's nothing wrong with changing your argument but there is something wrong with doing so while trying to obfuscate the change in debate with attacks.

From your other posts I would guess that you are (loosely) associated with the project / people on it.
You would guess wrong.

Please back up your claim about breaking code up into packets (preferably with a paper citation), because if they have done that I would like to read it.
Perhaps you would care to examine the TRIPS publications page where there are several papers on the compiler available.

As someone who did his PhD in an area that has no (foreseeable) commercial application I don't really need an explanation of this from you.
Yale Patt, is that you?!?

--
Re:Marketting hype? by smallfries · 2007-04-25 10:43 · Score: 1

Okay, there are two lines of argument here that I'll summarise (correct me if you disagree):

x Is TRIPS an advance on academic research in the area?
x Does TRIPS describe a shift in commercially ready parallel platforms?

From what you've said you seem to think that the second point is down to me shifting the argument. I would disagree, the article summary and press releases from the team seem to support the idea that they think they have something that radically shifts the current state of the art. From what I've read in scanning their publications I would disagree with this point. I'm not sure if you support it or not, but you do seem to say that this is not an issue, when the slashdot article would suggest otherwise.

From a commercial point of view I don't think that what they've done will shift the market. Over the years there have been many exotic parallel architectures that have not shifted the mainstream because their gains do not apply to real-world code. In the compiler papers section there is some evidence that they have made an advance here. In particular it's interesting that they can accelerate gzip, given the workload in that benchmark.

Of course there have also been exotic architectures that have become mainstream - GPUs are one example. So I don't think their research has market potential to replace the current mainstream x86 variants. This does not mean that I don't think that they've done anything interesting, as I said, and you decided not to quote.

From a research point of view, they have two advances; a new architectural layout that exploits parallelism with a certain tradeoff, and compiler technology that restructures conventional code to take advantage of this layout. They do argue quite heavily in each paper that it is a very different architecture to a dataflow machine, or a vector layout. I'm not convinced about this point, mainly because their arguments are quite buzzword heavy, and I don't see any substance.

The compiler side is more interesting, but it doesn't appear to be validated. They are comparing their performance on benchmarks against a standard compiler. They should be comparing the speedup they achieve against rival approaches. In particular this would give some idea of how efficiently they are exploiting the transistor budget with their design. This is standard in the literature, but it doesn't make it correct to phrase the validation that way.

I hadn't heard of Yale Patt but it seems quite a nice comparison (for me). My background is more in language design and compilation than in architecture. I've read quite a lot of papers on random exotic designs for parallel architectures that haven't gone anywhere, so I have a healthy (but possible misplaced) cynicism on the topic.

My comments about media applications were a guess that their architecture would require coarse-grained data parallelism to show off any real gains in performance. Their description of loop mining and tiling techniques seem to back that up, although they show interesting results on fairly sequential benchmarks.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php

TRIPS by HTH+NE1 · 2007-04-24 08:40 · Score: 1

TRIPS (obligatory back-formation given in the article)

Is that to make people RTFA (Read The F[ine] Article), or because "Tera-op, Reliable, Intelligently adaptive Processing System" was 13 more characters than the submitter wanted to copy and paste?

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?

Re:TRIPS by glwtta · 2007-04-24 08:56 · Score: 3, Insightful

I think it's mostly because the backronym is contrived and silly.

--
sic transit gloria mundi

One trillion calculations per second by 2012 by xocp · 2007-04-24 08:41 · Score: 3, Informative

A link to the U of Texas project website can be found here.

Key Innovations:

Explicit Data Graph Execution (EDGE) instruction set architecture
Scalable and distributed processor core composed of replicated heterogeneous tiles
Non-uniform cache architecture and implementation
On-chip networks for operands and data traffic
Configurable on-chip memory system with capability to shift storage between cache and physical memory
Composable processors constructed by aggregating homogeneous processor tiles
Compiler algorithms and an implementation that create atomically executable blocks of code
Spatial instruction scheduling algorithms and implementation
TRIPS Hardware and Software

Re:One trillion calculations per second by 2012 by xocp · 2007-04-24 08:51 · Score: 3, Informative

DARPA is the primary sponsor...
Check out this writeup at HPC wire.
A major design goal of the TRIPS architecture is to support "polymorphism," that is, the capability to provide high-performance execution for many different application domains. Polymorphism is one of the main capabilities sought by DARPA, TRIPS' principal sponsor. The objective is to enable a single processor to perform as if it were a heterogeneous set of special-purpose processors. The advantages of this approach, in terms of scalability and simplicity of design, are obvious.

To implement polymorphism, the TRIPS architecture employs three levels of concurrency: instruction-level, thread-level and data-level parallelism (ILP, TLP, and DLP, respectively). At run-time, the grid of execution nodes can be dynamically reconfigured so that the hardware can obtain the best performance based on the type of concurrency inherent to the application. In this way, the TRIPS architecture can adapt to a broad range of application types, including desktop, signal processing, graphics, server, scientific and embedded.
Re:One trillion calculations per second by 2012 by faragon · 2007-04-24 09:31 · Score: 1

Explicit Data Graph Execution (EDGE) instruction set architecture?
Scalable and distributed processor core composed of replicated heterogeneous tiles?
Non-uniform cache architecture and implementation?
...

Well, very disapointing when compared to other modern microprocessor architectures. Don't get me wrong, I love computer architecture, and the design seems interesting, but the "over-hype" is discouraging.
Re:One trillion calculations per second by 2012 by maxwell+demon · 2007-04-24 10:16 · Score: 1

Explicit Data Graph Execution (EDGE) instruction set architecture?

Exactly. As is explicitly stated in the PDF linked from this comment by volsung, TRIPS is an implementation of EDGE.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:One trillion calculations per second by 2012 by wolf369T · 2007-04-24 23:35 · Score: 0

Will I also get modded informative if I post yet another link to http://www.cs.utexas.edu/~trips/?
Thanks! (That server just won't die, now would it?...)

Must...resist...obvious...joke by jimicus · 2007-04-24 08:42 · Score: 2, Funny

Imagine a beowulf cluster of these!

Re:Must...resist...obvious...joke by ookabooka · 2007-04-24 09:34 · Score: 1

Imagine a beowulf cluster of these!

Care for a game of chess? Nothing drives innovation in processing power more than a good game of chess :-D

--
If you are about to mod me down, keep in mind that this post was most likely sarcastic.
Re:Must...resist...obvious...joke by RedElf · 2007-04-24 09:53 · Score: 1

What!?!? You mean you don't already have a beowulf cluster of these?

--
You know, I have one simple request. And that is to have sharks with frickin' laser beams attached to their heads!
Re:Must...resist...obvious...joke by shmlco · 2007-04-24 11:02 · Score: 1

How about a nice game of chess?

--
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
Re:Must...resist...obvious...joke by ookabooka · 2007-04-24 11:53 · Score: 1

I prefer my chess games to be realtime. :-p Having your opponent drop the game for something "more important" is kinda annoying.

--
If you are about to mod me down, keep in mind that this post was most likely sarcastic.

Useless! by iminplaya · 2007-04-24 08:42 · Score: 1

Here From the horses mouth. Plus we don't have to keep that damn digg thing. Come on, guys. A little less fluff please.

--
What?

Let me be the first to say.... by einhverfr · 2007-04-24 08:43 · Score: 1

TRIPpy, dude....

--

LedgerSMB: Open source Accounting/ERP

Ix86 by nurb432 · 2007-04-24 08:44 · Score: 1

So i assume its software compatible with 90% of the code that the 'general public' uses?

It did say 'general purpose' and if you try to create something beter but different, you get slapped down eventually ( like PowerPC Apples. )

--
---- Booth was a patriot ----

Re:Ix86 by Anonymous Coward · 2007-04-24 08:56 · Score: 0

Except that it wasn't better. Other than that you are spot on!
Re:Ix86 by convolvatron · 2007-04-24 08:59 · Score: 4, Insightful

you are absolutely right. no one should ever do any research into
something which doesn't ultimately look like an x86.
Re:Ix86 by jarom · 2007-04-24 10:12 · Score: 1

Eh, just slap on a few million transistors to make the ISA translation, and you are good to go.

--
This signature is far too complex to have been created by chance.
Re:Ix86 by ghoul · 2007-04-24 10:21 · Score: 1

I took Prof Burger's course and we studied the TRIPS processor. There are two parts to the project. One is the Chip team and the other is the software team led by Dr Katherine McKinley . The software team has developed emulators so that current code can run on the TRIPS processor. Of course emulation is never as good as native execution but it does provide an upgrade path. The key thing to notice is that the upgrade path has been part of the thinking from the beginning.

--
**Life is too short to be serious**
Re:Ix86 by nurb432 · 2007-04-24 12:24 · Score: 1

I never meant that, I only meant that anything else seems to be a commercial deadend due to the market dominance.

Having a compatibly layer will help prevent it from being a doomed project/product.

--
---- Booth was a patriot ----
Re:Ix86 by Anonymous Coward · 2007-04-24 13:22 · Score: 0

There are lots of uses for computers besides making Firefox run faster. Binary compatibility is irrelevant to many people involved in scientific computing, for example.
Re:Ix86 by 644bd346996 · 2007-04-24 13:22 · Score: 1

Somewhere in their pdfs, it says that the EDGE architecture should be better at emulating x86 than a VLIW or Itanic processor. If they can get some dynamic recompilation going, they should be good. (Though they will still have to scale beyond 500Mhz.) It does seem pretty interesting having 16 ALUs per core, and not using registers for intermediate values.
Re:Ix86 by Anonymous Coward · 2007-04-24 18:44 · Score: 0

It is still possible to bring a new kind of processor architecture to market and win - the "only" requirement is that you make a big enough breakthrough in performance that it becomes worthwhile to port and/or emulate x86 software. As the various incarnations of Cray demonstrate, traditional supercomputer applications (science and national security) can sustain a business only so long, and the hard problem is to achieve the superior performance at a price that is competitive with commodity hardware. But IF you can sell ten times x86 performance (in general-purpose codes) at twice the price, market success is possible.

The closest (and still not very good) example I can think of is the Alpha in the 1990's, when the em86 emulator was comparable to top-end x86 and native Alpha code was 50 % faster, but this was not nearly enough to convince the market, and Intel improved fast enough that Alpha couldn't compete in the end.

It is also likely that a breakthrough in performance is what the Transmeta guys were originally aiming at, but ultimately they couldn't compete in performance and had to market themselves as a low-power device, but of course we all know that they didn't beat Intel/AMD there, either.

However, the basic problem in creating a new kind of architecture that can emulate x86 faster than real x86 hardware is that Intel (and AMD although their resources are a lot less) can play that game too, by making a new kind of x86 chip that implements the x86 instruction set as an emulation. This is after all what they did to compete with the originally RISC-only technique of out-of-order processing.
Re:Ix86 by Criffer · 2007-04-24 21:00 · Score: 1

However, the basic problem in creating a new kind of architecture that can emulate x86 faster than real x86 hardware is that Intel (and AMD although their resources are a lot less) can play that game too, by making a new kind of x86 chip that implements the x86 instruction set as an emulation.

In fact, this is exactly how modern Intel and AMD chips work. Internally, they are RISC, with an instruction reordering unit and a vector unit. All the horrible instructions in the x86 ISA are microcoded, allowing the greater part of silicon to be dedicated to making particular instructions fast (media processing, mainly). There isn't actually hardware for things like the AAA instruction, since nobody cares about the speed of BCD calculations; there is hardware for single-cycle floating-point dot-products. The miniscule number of registers everyone complains about is irrelevant, since internally the out-of-order execution unit has dozens of shadow registers - only calculations which are correctly predicated actually make it as far as user-visible registers.

Given then, the fact that Intel and AMD are already essentially emulating x86 in (fast, parallelised, speculative) hardware, I doubt that anyone else can come up with a faster chip by doing basically the same thing. Transmeta tried this and failed.

Since then there have been a number of attempts at removing the out-of-order unit, basically transferring execution control to the compiler, such as these people are doing. Intels' attempt, EPIC, was a disaster. I think the idea is folly, as advances in compiler design can't make old programs faster (unless you're using open source), whereas a faster execution unit with better branch prediction and parallel speculative execution will make all programs running on the chip faster (and therefore sell more chips).

TRIP UP by Anonymous Coward · 2007-04-24 08:45 · Score: 0

How about the TRIP UP processor, that needs a new set of feet. Or, how about the DRIPS processor that needs a paper towel to blot it dry. Or, how about, the NIPS processor, that needs nip/tuck. Or, how about the SHITS processor, that needs toilet paper. Little on the details, much on the promises is usually a bad sign.

Gets rid of the register-file by DrDitto · 2007-04-24 08:47 · Score: 5, Insightful

The EDGE architecture gets rid of relying on a single register file to communicate results between instructions. Instead, a producer-consumer ISA directly sends results to one of 128 instructions in a superblock (sort of like a basic block, but larger). In this way, hopefully more instruction-level parallelism can be extracted because superscalars can't really go beyond 4-wide (8-wide is a stretch...DEC was attempting this before Alpha was killed). Nice concept, but it doesn't solve many pressing problems in computer architecture, namely the memory wall and parallel programmability.

Re:Gets rid of the register-file by bishiraver · 2007-04-24 10:03 · Score: 1

That's because they designed it in 2004, built the prototype in 2005, and some blogspam idiot is publicizing it in 2007.
Re:Gets rid of the register-file by flaming-opus · 2007-04-24 10:11 · Score: 1

Well, it gets rid of the isa-visible register file. That doesn't mean there aren't sram cells in there holding onto data. Don't confuse architecture an implementation.
Re:Gets rid of the register-file by Tokerat · 2007-04-24 11:50 · Score: 1

Well, it gets rid of the isa-visible register file.
Anything that makes technical jargon sound less like Jar-Jar Binks is a win in my book.

--
CAn'T CompreHend SARcaSm?
Re:Gets rid of the register-file by DrDitto · 2007-04-24 13:10 · Score: 1

Fine. It gets rid of the complex, non-scalable register-bypass logic in the instruction window of an out-of-order superscalar.
Re:Gets rid of the register-file by DrDitto · 2007-04-24 13:13 · Score: 1

yup. in fact they came up with EDGE before 2004.
Re:Gets rid of the register-file by julesh · 2007-04-26 05:19 · Score: 1

Nice concept, but it doesn't solve many pressing problems in computer architecture, namely the memory wall and parallel programmability.

No, but what it does do is completely eliminate the need for register renaming, which (as I understand it) consumes a significant proportion of the silicon in most modern OOE-capable processors. This saved area can then be diverted into having either more execution units or more cache, whichever best helps with the target problem domain. It also makes thread-level parallelism easier to achieve, because the register renaming cache doesn't have to be duplicated for each executing thread.
Re:Gets rid of the register-file by DrDitto · 2007-04-26 10:02 · Score: 1

You make a good point in that it does eliminate register-renaming logic. However I don't see how register-renaming is more resource-intensive than the wakeup/depedency logic for the bypassing of values from functional units to waiting instructions ready to fire. As the instruction window gets larger, the bypassing and the size of the wakeup CAMs get out of control. ??

I want one by Anonymous Coward · 2007-04-24 08:47 · Score: 0

But when are they likely to be ready?

This is cool! by adubey · 2007-04-24 08:50 · Score: 3, Informative

The link has NO information.

The PDF here: has more information about EDGE.

The basic idea is that CISC/RISC architectures rely on storing intermediate data in registers (or in main memory on old skool CISC). EDGE bypasses registers: the output of one instruction is fed directly to the input of the next. No need to do register allocation while compiling. I'm still reading the PDF, this sounds like a really neat idea, though.

The only question is, will this be so much better than existing ISA's to eventually replace them? -- even if only for specific applications like high-performance computing.

Re:This is cool! by treeves · 2007-04-24 09:05 · Score: 2, Insightful

If it's so cool, why did it take three years for us to hear about it? I'm really asking, not just trolling.

--
...the future crusty old bastards are already drinking the Kool-Aid.
Re:This is cool! by Angstroem · 2007-04-24 09:10 · Score: 1

Because you read the wrong publications. Try IEEE and ACM digital library.

Doug Burger's work is known to computer scientists for years...
Re:This is cool! by BrewerDude · 2007-04-24 09:30 · Score: 1

Hmm. Interesting.

I wonder how this differs from the dataflow architectures of the early 90s?
Re:This is cool! by treeves · 2007-04-24 09:41 · Score: 1

That's what I mean. The link given by the poster I replied to was to an article from IEEE Computer from Jul 2004. I'm not a computer scientist so I don't regularly read those journals. But the question is, is it really "news for nerds" if it's three years old?

--
...the future crusty old bastards are already drinking the Kool-Aid.
Re:This is cool! by treeves · 2007-04-24 09:44 · Score: 1

Sorry for replying to my own post, but I guess the answer is that they devised the architecture three years ago, but just now have the actual thing in silicon. I should read more carefully or not rely on short-term memory of 30 seconds ago!

--
...the future crusty old bastards are already drinking the Kool-Aid.
Re:This is cool! by sofla · 2007-04-24 09:57 · Score: 1

The basic idea is that CISC/RISC architectures rely on storing intermediate data in registers (or in main memory on old skool CISC). EDGE bypasses registers: the output of one instruction is fed directly to the input of the next. No need to do register allocation while compiling. I'm still reading the PDF, this sounds like a really neat idea, though.
What I liked even more was the idea of "execution blocks", where a given (processor pipeline width) worth of instructions are treated as an atomic "transaction"... clearly this is a nod to minimize the branch prediction penalty. I haven't been keeping up with hardware arch enough to know if this is a new idea or not, but it was new to me.

The only question is, will this be so much better than existing ISA's to eventually replace them? -- even if only for specific applications like high-performance computing.
I think it may be a contender, or at least gain some popularity (like MIPS did in the 90's), if nothing else for the fact that the big limitation in the EDGE design appears to be in the ability for real programs to exploit parallelism. This is exactly the same problem we're fancing with the latest batch of multi-core processors from Intel: all those processing units are nice, but its a b**ch to keep them all busy.
Re:This is cool! by ghoul · 2007-04-24 10:27 · Score: 1

Its been in development for a while. Papers were published 3 years back but the actual working prototype just came out last year and after testing and debugging it was released to public in a function this month. Just like AMD has been talking about Greyhound for 3 years but it wont release till June this year

--
**Life is too short to be serious**
Re:This is cool! by KnowledgeKeeper · 2007-04-24 11:14 · Score: 0

EDGE bypasses registers: the output of one instruction is fed directly to the input of the next. No need to do register allocation while compiling.

Whoa, implementing unix pipes in hardware, low-level style :)

--
It is always better to be a first grade version of yourself than a second grade version of someone else.
Re:This is cool! by frank_adrian314159 · 2007-04-24 11:28 · Score: 1

The implementation still has to rely on some registers to hold the intermediate computations whether they are exposed in the ISA or hidden in the instruction dispatch unit. This doesn't seem that new. Tomasulo used a similar idea in the FP unit of the IBM 360/195 back in 1967. The dataflow idea was extensively mined at MIT in the eighties. This seems to be just an implementation of the latter that uses the former with a large number of functional units and reservation stations thrown in. Most modern microprocessors do something similar - the only difference is in scale. Remember that there are usually no really new ideas in computer architecture - the old ideas just get revisited with a couple new twists every twenty-five years or so.

--
That is all.
Re:This is cool! by Klintus+Fang · 2007-04-24 17:43 · Score: 1

But that is how current CISC/RISC processors function in practice. They all contain large bypass networks that feed data directly from the output of one instruction to the input of the next whenever they can. The register file is there, but for instructions on the critical execution path it is largely an abstraction and those critical instructions rarely ever directly read from the register file anyway.

I understand that this new EDGE ISA is apparently aiming to formalize that and abstract the register file away completely. But whether there is really any benefit to doing that formally or whether most of the benefit is already available in how current processor's handle the register bypass network is an open question.

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
Re:This is cool! by adrianmonk · 2007-04-24 17:58 · Score: 1

The only question is, will this be so much better than existing ISA's to eventually replace them? -- even if only for specific applications like high-performance computing.

Or running Java. Or CLR.
Re:This is cool! by Anonymous Coward · 2007-04-24 19:19 · Score: 0

I'm assuming they meant if(x>7) in their code snippet instead of of(x>7) (that's a bad typo right there).

I scanned the article for numbers regarding performance speed, etc. and couldn't see anything (yet the of was too obvious).

Any articles with more code, and performance numbers?

I *am* a programmer (a very high level one at that) and to be honest I do not care "how" the processor orders and executes my instructions, I care about how "fast" it can do it.

Obviously I'm not writing anything to any processor, but I would like to know that the tools that I use (languages and their compilers) will be able to take advantage of some feature on some processor that my high-level-shallow programming mind cannot understand.

Moore's law immortal? by Manos_Of_Fate · 2007-04-24 08:51 · Score: 3, Interesting

It seems like for every "realist" claiming that Moore's law will soon hit a ceiling, I see another ZOMG Breakthrough! Lately, the question I've been asking myself is, "Will we ever surpass it?"

--
Isn't enough that I ruined a pony, making a gift for you?

Re:Moore's law immortal? by mandelbr0t · 2007-04-24 09:45 · Score: 1

Will we ever surpass it? Doubtful, unless we see a new hardware player burst onto the scene. AMD made quite a splash, but they certainly didn't have any potential, nor do they today, to outpace Moore's Law. Intel still drives the hardware market.

--
"Please describe the scientific nature of the 'whammy'" - Agent Scully
Re:Moore's law immortal? by maxwell+demon · 2007-04-24 10:04 · Score: 1

Moore's Law is about the transistor density on the chip. A new processor design may help getting more performance from the same transistor density, but it certainly doesn't anything to increase the transistor density.

Since there's a finite atom density on a chip, the transistor density will inevitably stop to grow eventually.

--
The Tao of math: The numbers you can count are not the real numbers.

TRIPS web page by kiick · 2007-04-24 08:53 · Score: 1

Here is the web page for TRIPS, straight from UT austin:
http://www.cs.utexas.edu/~trips/

Enjoy.

Re:TRIPS web page by Auntie+Virus · 2007-04-24 08:59 · Score: 1

That's Captain Trips to you.
Don't fear the reaper....

--
Why yes, I *AM* new here. Why?

but... by Klintus+Fang · 2007-04-24 08:55 · Score: 4, Insightful

The motivations for this technology provided in the article ignore some rather basic facts.

They point out that current multi-core architectures put a huge burden on the software developer. This is true, but their claim that this technology will relieve that burden is dubious. They mention, for example, that current processing cores can typically only perform 4 simultaneous operations per-core, and imply that this is some kind of weakness. They completely fail to mention that the vast majority of applications running on those processors don't even use the 4 available scheduling resources in each core. In other words, the number of applications that would benefit from being able to execute more than 4 simultaneous instructions in the same core is vanishingly small. This is why most current processors have stopped at 3 or 4. Not because they haven't thought of pushing it beyond that, but because it is expensive, and because it yields very little return on the investment. Very few real-world users would see any performance benefit if the current cores on the market were any wider than 3 or 4. Most of those users aren't even using the 4 that are currently available.

Certainly the ability to do 1024 operations simulatenously in a single core is impressive. But it is not an ability that magically solves any of the current bottlenecks in multi-threaded software design. Most software application developers have difficulty figuring out what to do with multiple-cores. Those same developers would have just as much (if not more) difficult figuring out what to do with a the extra resources in a core that can execute 1024 simultaneous operations.

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot

Re:but... by $RANDOMLUSER · 2007-04-24 09:15 · Score: 3, Informative

Two words: loop unwinding. This critter is perfect to run all iterations of (certain) loops in parallel, which would be determinable at compile time.

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Re:but... by BillyBlaze · 2007-04-24 09:18 · Score: 1

The benefit per simultaneous operations is not necessarily monotonically decreasing.

Consider a loop with a medium-sized body, and iterations mostly independent. If there are enough simultaneous operations allowed to schedule multiple iterations through the loop at once, the loop could potentially run that many times faster. Now, with current designs, there aren't that many slots, and even if there were, the ISA makes it difficult to express this in a way that's useful to the processor. All we can do is OpenMP-like stuff where the programmer explicitly tells the runtime system to try to divide the loop between multiple threads. There's a lot of overhead, both in terms of context switching and programmer time.

If, however, a different paradigm for an ISA can make it easier for compilers to communicate the dependencies to the processors, then the processors will be able to take advantage of that parallelism much more.
Re:but... by beef623 · 2007-04-24 09:25 · Score: 1

Know of any good programming resources that show how to take advantage of those extra resources?
Re:but... by Klintus+Fang · 2007-04-24 09:32 · Score: 3, Interesting

of course loop unwinding works fine... when you have a long loop. it does though have two problems. 1) it only works when you have very long loops where there are very little dependencies between the consecutive iterations of the loop 2) even when it does work, it causes the code footprint of the application to be much bigger which means you end up putting a lot more stress on your cache pipeline, requiring bigger caches and a wider fetch engine. And that all aside, what about the vast majority of code segments where massive parrallelizable loops are not being executed? Loop unwinding isn't going to help at all for those.

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
Re:but... by Klintus+Fang · 2007-04-24 09:36 · Score: 1

That is exactly what Itanium's ISA does... Itanium is designed around the idea that the compiler knows best and provides a lot of tools to the compilers to enable the types of things you are talking about. But compilers either are not making use of it (or are unable to). It's not clear which.

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
Re:but... by shmlco · 2007-04-24 11:11 · Score: 1

"They completely fail to mention that the vast majority of applications running on those processors don't even use the 4 available scheduling resources in each core."

Yes, but that's primarily because most of those resources are specialized. One or two of those are integer paths, one's a branch system, another is floating point, and so on. If the current code block doesn't include any of those specialized instructions, then those particular execution paths sit there unused.

--
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
Re:but... by Klintus+Fang · 2007-04-24 11:29 · Score: 1

yes. but that isn't usually why the resources go unused. The more common problem is data dependencies. No amount of widening the core is going to resolve those. What frequently happens on current applications is that there is an instruction in the scope of the OOO buffer that could use the unused resources but there is a data dependency between it and some other un-issued or unfinished instruction in the pipeline that prevents the dependent instruction from issueing. Deepening the OOO buffer can help with that, but that technique hits diminishing returns very fast (though I'll acknowledge that there are some academics who have published papers claiming otherwise [I don't agree with them]). More generally, the current ratios of integer to fp to branch to load/store units in most modern cores tends to reflect the typical number that most applications currently in use will need. Those ratios are not picked at random. Architects analyis typical usage models when deciding how many of each type of pipeline to put in there. FP pipelines are one exception. FP pipelines draw too much power and that tends to be a limiting factor. But only a heavy floating point application would benefit from that and there aren't really that many users running those regularly.

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
Re:but... by Anonymous Coward · 2007-04-24 13:54 · Score: 0

How many opperations can the human brain do concurrently, we are more than likely missing something key. When ever this many many people agree were usually wrong.
Re:but... by julesh · 2007-04-26 05:37 · Score: 1

They mention, for example, that current processing cores can typically only perform 4 simultaneous operations per-core, and imply that this is some kind of weakness. They completely fail to mention that the vast majority of applications running on those processors don't even use the 4 available scheduling resources in each core. In other words, the number of applications that would benefit from being able to execute more than 4 simultaneous instructions in the same core is vanishingly small.

What the article doesn't point out is that the folks who've designed the processor have done so in parallel with compiler research that has determined that the architecture they've produced is substantially better at allowing parallelism to be exploited. The reason most programs don't use all 4 of the current-gen ALUs at once is because the processor's logic for determining which instructions are ready to be executed isn't particularly good. The 4-instruction limit is a limit of current architectures, not a basic one (although it is a limit fundamental enough to the architecture that you don't really benefit by adding more than 4).

obvious drm implications.. by plasmacutter · 2007-04-24 09:07 · Score: 1

with individual instructions no longer spitting out to registers, and , to quote you Compiler algorithms and an implementation that create atomically executable blocks of code , does this not mean they can finally hide keys from us in the die of a general purpose processor?

--
VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!

Re:obvious drm implications.. by dreamchaser · 2007-04-24 09:16 · Score: 1

I doubt that DARPA cares much about DRM, and if AMD and Intel wanted to they could have already hidden encryption keys on their CPU's.
Re:obvious drm implications.. by plasmacutter · 2007-04-24 09:35 · Score: 1

and if AMD and Intel wanted to they could have already hidden encryption keys on their CPU's.

not true, otherwise they would not be general purpose because they would not run every piece of x86 software thrown at them.

with the current architecture the key has to be in plaintext in one of the registers, which can then be dumped.

in this proposed architecture it can be passed to the next instruction throughout huge contiguous blocks of code w/o touching a register.

this also brings up the related issue of debugging on such chips.

--
VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!
Re:obvious drm implications.. by dreamchaser · 2007-04-24 09:47 · Score: 1

I disagree. They could quite easily add extensions to the microcode and architecture that would allow harder to break DRM (there is no such thing as impossible to break). The key(s) would leak due to human nature though so it would be a futile effort.
Re:obvious drm implications.. by Anonymous Coward · 2007-04-24 10:42 · Score: 0

You think existing keys fit in a single machine register? AACS is already designed to never keep the entire key in plaintext. Of course sometimes the hardware, like the Xbox's HD-DVD player, helpfully adds a debug mode for that purpose, but it's ultimately not necessary to crack the key.

Moving the whole shebang, player key, decoder and all into one chunk of silicon is probably the dream of the content companies, but that idea's a non-starter for everyone else involved.

nothing spectacular by CBravo · 2007-04-24 09:08 · Score: 4, Informative

Right, let me begin by saying that after reading ftp://ftp.cs.utexas.edu/pub/dburger/papers/IEEECOM PUTER04_trips.pdf it actually became a bit more clear about what they were talking about.

It might sound very novel if you are only accustomed to normal processors. Look at MOVE http://www.everything2.com/index.pl?node_id=103228 8&lastnode_id=0 to see what transport-triggered architectures are about. They are more power efficient, etc etc.

Secondly, they talk about how execution graphs are mapped onto their processing grid. I don't think any scheduler has a problem with scheduling an execution graph (or whatever name you give it) to an architecture. Generally, it can be scheduled in-time (there is a critical path somewhere) or it is scheduled with a certain degree (generally > .9 efficient) of optimality. I don't see the gain there in efficiency.

Now here comes the shameless self-plug. If you want to gain efficiency in scheduling a node of an execution graph you have to know which node is more critical than the other. The critical nodes (the ones on the critical path) need to be scheduled to the fast/optimized processing units and the others can be scheduled to slow/efficient processing units (and they can get some communication delays without penalty). Look http://ce.et.tudelft.nl/publicationfiles/786_11_dh ofstee_v1.0_18july2003_eindverslag.pdf here for my thesis.

--
nosig today

Re:nothing spectacular by David+Greene · 2007-04-24 16:45 · Score: 1

Nod to you for referencing transport-triggered architectures and TRIPS' relationship to them. Good job!

But I must disagree with you about scheduling. Scheduling is HARD. It's hard because there are lots of unknowns like memory latency and how the OOO engine will actually execute the code. The TRIPS compiler doesn't so much schedule the operations themselves as it does the communication. It has to worry about things like routing distance and so on. Yes, a naive implementation would be straightforward but doing something intelligent is often much more difficult.

Thanks for the reference to your work. I'll definitely take a look.

--

Nope, the most obvious joke is by varmittang · 2007-04-24 09:08 · Score: 1

Can it run Linux?

--
-----BEGIN PGP SIGNATURE-----
12345
-----END PGP SIGNATURE-----

Re:Better support for concurrency in Languages by Anonymous Coward · 2007-04-24 09:21 · Score: 2, Insightful

A lot of this is due to the fact that most popular languages right now do not support concurrency very well. Most common languages are stateful, and state and concurrency are rather antithetical to one another. The solution is to gradually evolve toward languages that solve this either by forsaking state (Haskell, Erlang) or by using something like transaction memory for encapsulating state in a way that is easy to deal with (Haskell's STM, Fortress (I think), maybe some others).

Concurrency is not that hard to do well in the right setting.

And before anyone claims that Haskell and Erlang are impractical, there are many examples of "real world" programs written in them.

A few nice, and very useful ones are Yaws (for erlang) and Darcs (for Haskell). There are many others (even quake clones), which I won't bother listing, but people can find them easily if they look.

Regarding concurrency, and its ease of use in these languages, I'm taking a machine learning class at the moment where most of the problems are computationally intensive, and could stand for improvement by making use of multiple cores. I do all of my assignments in Haskell, and not only are my solutions often shorter than those of my classmates (and they often work fine the first time they compile), but it's usually trivial to allow my application to scale nicely to as many cores as I can throw at it. It's worth mentioning, by the way, that most algorithms given in these classes are given under the assumption that people are using imperative languages, and even then, it's still easy. It takes a while to learn how to approach problems differently without mutable state, yes, but it's not as hard as some people make it out to be. I think it has more to do with the fact that people just don't like to learn anything new unless they absolutely are forced to do so, which is a pity.

By the way, there is a nice presentation from Tim Sweeney on what he would like future programming languages to look like, and there's a lot in there about functional programming, concurrency, and expressive (re: dependent) types.

data graph by LordMyren · 2007-04-24 09:21 · Score: 1

data graph sounds suspiciously like some kind of branching transactional execution system.

Welcome to 1994 by Anonymous Coward · 2007-04-24 09:26 · Score: 0

...and the first round of lab testing for EPIC. If they keep this up, eventually they can independently invent Itanium1 (yuk).

I love how they skipped EPIC in their comparison section in the pdf.

Re:Welcome to 1994 by Wesley+Felter · 2007-04-24 09:56 · Score: 3, Informative

EPIC (i.e. Itanium) is still based on centralized structures like register files. To create a 16-issue EPIC processor, you'd need a ~32R/16W port register file which would be virtually impossible to build because it would be so huge and power-hungry. Also, EPIC needs heroic compiler optimizations to overcome its in-order execution, while EDGE is naturally out-of-order.

Nice Processor! by Supreme+Dragon · 2007-04-24 09:32 · Score: 0

Is it fast enough to run Vista?

Re:Nice Processor! by Anonymous Coward · 2007-04-24 09:44 · Score: 0

Yes, but don't turn on Aero.

The Compiler Is Key by A440Hz · 2007-04-24 09:38 · Score: 1

I've worked in detail with a VLIW (Very Long Instruction Word) architecture, the TI 'C6x DSP. It has eight execution units (not all of which can perform the same operations, though there is a little overlap) which can all be active in a single cycle. However, the key is keeping all of the units busy.

While the C compiler for this architecture is incredibly good, there are situations where using raw assembly (quite hard because of pipelining issues) or "compiled assembly" (easier, since you write in the order you wish operations to occur, and the compiler schedules the pipeline for you) gives better performance.

In short, no matter how much hardware folks can throw at a computing problem, the issue is adapting lots of different kinds of software to the architecture. Sounds like the compiler is going to have to be very good, or else there will have to be some runtime mojo to keep all of the chip doing something useful.

Yeah but... by da007 · 2007-04-24 09:38 · Score: 1

Does it run Linux?

Re:Better support for concurrency in Languages by Klintus+Fang · 2007-04-24 09:39 · Score: 1

That is food for thought.... thanks! ;-)

--
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot

I want one by Anonymous Coward · 2007-04-24 09:39 · Score: 0

But when are they likely to be ready?

actually... by slew · 2007-04-24 09:40 · Score: 1

You can read more about it here...

Actually, from what I can tell it's more like a VLIW with it's program chopped up into horizontal and vertical microcode "chunks" for more efficient register forwarding, than a vector processor...

I figure that it chops up the code into 128-instruction chunks (or smaller if there are branch dependancies that can't be done with predicates) and schedules it horizontally (the classic wide VLIW microcode which feeds independent instruction pipelines), and vertically (the sequence that can distribute over time and use register forwarding paths). The pipelines seem to be loosely coupled through reservation stations and the forwarding done with low bandwidth wormhole routes so it isn't a rigid as a classic VLIW machine.

I doubt it does that much better with normal scalar code (which has lots of branches), but it probably is much better than a vector processor would be with irregular code.

And when it is determined to have bugs??? by WindBourne · 2007-04-24 10:00 · Score: 1

will the new name be captain trips?

--
I prefer the "u" in honour as it seems to be missing these days.

Re:And when it is determined to have bugs??? by Anonymous Coward · 2007-04-24 11:01 · Score: 0

or Project Blue if IBM gets involved

Oh, guys, come on... by harry666t · 2007-04-24 10:14 · Score: 0

...nobody is going to use it if it's not x86 compatible.

This is just an update from year ago... by coldmist · 2007-04-24 10:15 · Score: 4, Informative

Here is the slashdot article from 2003 about this processor: link

The specs have been updated to 1024 from 512, but that's about it.

Another 3-5 years out?

--
Don't steal. The government hates competition.

Re:This is just an update from year ago... by naoursla · 2007-04-24 19:56 · Score: 1

I am pretty sure one difference is that this time they have a physical chip.

TRIPS may solve some problems by knowsalot · 2007-04-24 10:24 · Score: 2, Informative

The big thing that all the commenters have missed that I've read so far is the fact that OOO execution is difficult not because it's hard to make many ALU's on a chip (vector design, anyone?) but because in a general-purpose processor the register file and routing complexity grows as N^2 in the number of units. That's bad. Every unit has to communicate with every other unit (via the register file or, more commonly, via bypasses to an OOO buffer for every stage prior to writeback). The issue being addressed here is wiring complexity which, as modern designers would tell you, is a much harder problem than designing fast logic. Routing is hard. Plunking down more ALU's is easy. If you eliminate the register file, and design your processor and ISA to feed instructions in a data-flow manner to thousands of ALU's then you may be able to vastly simplify routing requirements, thereby decreasing the length of your critical path electrical circuits, thereby allowing the processor to clock faster. (Data-flow execution is executing instructions when their data inputs are ready, rather than tracking the compiler-optimized order, which does not have the run-time information that the hardware has.) If you are clever about your compiler, and make your hardware wide enough, you can for example speculatively execute both sides of a branch until it is resolved, thus eliminating a certain percentage of pipeline stalls for branch mispredicts. Similarly, with data-prediction you can speculate during cache misses. The list goes on. This is a very new and different paradigm (ugly word) for CPUs which may lead to higher IPC. This isn't the single golden goose, but it's a very different way of looking at the problem of pushing more instructions through a processor at higher speeds.

Re:TRIPS may solve some problems by knowsalot · 2007-04-24 10:35 · Score: 3, Interesting

Oh, and before someone points this out for me, you have to imagine that the routing requirements are VASTLY improved. Imagine a grid of ALU's each connected by a single bus, (simple,) rather than 128 bypass busses all multiplexed in to each ALU. (chaos! don't forget the MUX logic!) You map one instruction to one (virtual) ALU, rather than one result to a (virtual) register. Then you pipeline/march each instruction with its partial data down the grid until all the inputs come in. Instructions continually cascade in the top of the grid, and commit out the bottom. But their results are available to feed other instructions as soon as they are computed! Never have to wait for a MUX or a bus or what-have-you. Plus, you can clock the whole thing EXTREMELY fast, because you don't have these wire-delays from difficult routing requirements.

Don't dismiss it by er824 · 2007-04-24 10:31 · Score: 5, Informative

I apologize if I butcher some of the details, but I highly recommend that anyone interested peruse the TRIPS website.

http://www.cs.utexas.edu/~trips/

They have several papers available that motivate the rationale for a architecture.

The designers of this architecture believed that conventional architectures were going to run into some physical limitations that were going to prevent them from scaling further. One of the issues they foresaw was that as feature size continued to shrink and die size continued to increase chips would become susceptible to, and ultimately constrained by wire delay. Meaning the amount of time it took to send a signal from one part of a chip to another would constrain the ultimate performance. To some extent the shift in focus to multi-core CPUS validates some of their beliefs.

To address the wire delay problem the architecture attempts to limit the length of signal paths through the CPU by having instructions send their results directly to their dependent instructions instead of using intermediate architectural registers. TRIPS is similar to VLIW in that many small instructions are grouped into larger instructions (Blocks) by the compiler. However it differs in how the operations within the block are scheduled.

TRIPS does not depend on the compiler to schedule the operations making up a block like a VLIW architecture does. Instead the TRIPS compiler maps the individual operations making up a large TRIPS instruction block to a grid of execution units. Each execution unit in the grid has several reservation stations, effectively forming a 3 dimensional execution substrate.

By having the compiler assign data dependent instructions to execution units that are physically close to one another the communication overhead on the chip can be reduced. The individual operations wait for the operands to arrive at their assigned execution unit, once all of operations dependencies are available then the operation fires and its result is forwarded to any waiting instruction. In this way the operations making up the TRIPS are dynamically scheduled according to the data flow of the block and the amount of communications that have to occur across large distances are limited. Once an entire block is executed its can be retired and its results can be written to a register or memory.

At the block level a TRIPS processor can still function much like a conventional processor. Blocks can be executed out of order, speculatively, or in parallel. They have also defined TRIPS as a polymorphous architecture meaning the configuration and execution dynamics can be changed to best leverage the available parallelism. If code is highly parallelizable it might make sense to allow bigger blocks mapped. However, by performing these type of operations at the level of a block instead of for each individual instruction the overhead is theoretically drastically reduced.

There is some flexibility in how the hardware can be utilized. For some types of software with a high degree of parallelism you may want very large blocks, when there is less data level parallelism available it may be better to schedule multiple blocks onto the substrate simultaneously. I'm not sure how the prototype is implemented but the designers have several papers available where they discuss how a TRIPS style architecture can be adapted to perform well on a wide gamut of software.

new innovations by Anonymous Coward · 2007-04-24 10:48 · Score: 0

whatever happened to the laser cpu developed in iran years ago?

oblig parallel just move along by cats-paw · 2007-04-24 11:06 · Score: 1

just move just move along now it's just more nothing to see here, along now nothing to see here,parallel processing it's just move along now it's just nothing to see here, just more parallel processing just move more parallel along now nothing to see here, more parallel processing it's just processing

--
Absolute statements are never true

One word: by robyannetta · 2007-04-24 11:21 · Score: 2, Funny

Skynet

--
- Just my $0.02, take with a grain of salt, your mileage may vary.

4 years old... by Anonymous Coward · 2007-04-24 11:34 · Score: 0

Come on guys. TRIPS has been around for something like 4 years.

hmm by Anonymous Coward · 2007-04-24 11:58 · Score: 0

but will it play Doom?

Re:hmm by quiahuitl · 2007-04-24 12:48 · Score: 1

There will be too many and too fast zombies. Only good thing is that zombies AI isn't so advanced so there will be only more blood everywhere (as usual of course).

Explicit Data Graph Execution by Anonymous Coward · 2007-04-24 12:04 · Score: 0

Cool... I have lots of explicit data!

Well... by aldo.gs · 2007-04-24 13:30 · Score: 1

...I guess this is gonna be trippy, all right.

naah by yoprst · 2007-04-24 13:44 · Score: 1

wake me up when they have trillions of jumps and memory reads/writes...

Re:naah by X86isa · 2007-04-25 14:23 · Score: 1

OK, you can wake up now - we do. :)

Compilers Are Not Magic, or Why IA64 Didn't Work by Erich · 2007-04-24 13:47 · Score: 2, Insightful

TRIPS, like many other projects optimized to produce the largest number of PhD students possible, starts out with a premise something like:

So, I have this big array of CPUs/ALUs/Functional Units, all I need to do is program them and I can be the computingest damn thing you ever saw!

And it's true. You build a sea of ALUs and you sic some folks on hand coding all sorts of things to the machine, and you end up with some spectacular results.

The problem is that we still can't get a compiler to do a good job at it, for the most part. We thought we could, and we threw every bell and whistle into IA64
for a compiler-controlled architecture, and you've seen what we've ended up with. Many years later, the situation is still pretty much the same: the compiler
can't do all that great of a job with these sorts of machines.

Don't get me wrong, there are lots of good ideas in TRIPS or any of the various other academic projects like it, but I'm yet to be convinced that it's useful in
any kind of real codebase that's not coded by hand by an army of graudate students. For some tasks, that's an acceptable model -- It's been the model in the world of
signal processing for quite a while (though becoming less so daily) -- but for most mainstream applications it just won't fly.

That, and it's hard for compilers to have knowledge about history. It's terribly important for optimization, and it's just hard to get into the compiler (though relatively
easy to get into a branch predictor).

--

-- Erich

Slashdot reader since 1997

Re:Compilers Are Not Magic, or Why IA64 Didn't Wor by Watson+Ladd · 2007-04-24 14:09 · Score: 1

Depends on the compiler. This sounds like MLRISC should have no problem targeting it.

--
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD

X86? In the desktop market, maybe... by Anonymous Coward · 2007-04-24 16:50 · Score: 0

X86 does not rule the world, you know!

- Xbox360 contains 3 dual-core 3.2Ghz PPC processors (i.e. 6 threads).
- PS3 contains a Cell processor (one general purpose core and 7 SPUs that are basically DSPs).
- Wii runs on the same kind of chip as the GameCube (can't remember, but I think it's a PPC?)

By the way, IBM makes the processors for all three of the next-gen game consoles I just listed. All three of them use graphics processors made by one of NVidia, ATI (I can't remember which ones use which).

- Many embedded devices (routers, cell phones, etc) use Mips, ARM or PPC processors.
- Big iron servers often use non-x86 processors (POWER and PPC for example).

Yes, X86-style processors continue to dominate in the desktop computer market, but there are a LOT of other processors out there that use other designs, because they are cheaper or lower power, and compatibility with existing legacy Wintel software is not needed so much in those markets.

and 90% of it is too slow? or in java? by cheekyboy · 2007-04-24 20:25 · Score: 1

Theres only a few things that need 10-500x increase in speed.

Video transcoding.
Rendering farms - need a $500 solution that can out do a $35,000 solution. ie, 10 x $35 chips on a $20 card + profit margin and yearly software licence.
Folding type apps.
Nuclear/Sci sims.

--
Liberty freedom are no1, not dicks in suits.

GPP Prototype by FreakyLefty · 2007-04-24 20:50 · Score: 1

Is this anything like the last GPP prototype we heard about? The one that had a brain the size of a planet, but anything you connected to it tried to committed suicide?

--
Strength through redundancy and over-design

Don't fear the reaper by Anonymous Coward · 2007-04-24 22:22 · Score: 0

Thanks for reminding me. Gotta love Blue Oyster Cult.

http://www.youtube.com/watch?v=JLQzfdCs_HU

Imagine by polyp2000 · 2007-04-24 23:18 · Score: 0, Redundant

... A Beowulf cluster of these!

(sorry ... it had to be said!)

N.

--
Electronic Music Made Using Linux http://soundcloud.com/polyp

Coding for another NEW Architecture by T0wner · 2007-04-25 01:05 · Score: 1

Programmers are having enough problems writing code for the PS3 at the moment. Sony's support doesn't help game companies are fidning they are having ask IBM to help since Sony know only slightly more than they do about it. Just recently one the lead PS3 launch managers has set off to make an offshoot company to try and do some cool stuff with the Cell chip. It's almost as if they are saying well if we don't do it who will? TRIP's is still in the research stage there will be very little in terms of libraries to take advantage of this and even less api's for anything like EA to get the teeth into. Wake me up when this becomes news please.

Re:Coding for another NEW Architecture by X86isa · 2007-04-25 14:17 · Score: 1

This is one of the key points that makes TRIPS so special: Very much unlike Cell, or GPUs, or even multicore X86s, TRIPS runs fast with NORMAL, sequential, ordinary C/C++ code. Your old programs will magically get faster, just like the good old days when MHz was increasing. So in short, all your old libraries already work on TRIPS.

TRIPS is also designed to evolve much faster in terms of MHz and # cores, etc. The key point of academia is to be 10 years ahead and tell industry what works and what doesn't because industry cannot afford to fail, so they don't try anything new. Cell and GPUs run faster by taking out hardware that helps programmers - TRIPS finds a way to keep things the way they always were but run faster anyway. It's an "under the hood" architectural enhancement to an existing ISA. (Not that you could tell from the average reporter spin, but the main point is that TRIPS is an alternative to getting speed from multiple cores.)

BTW, I don't get all these "slow development" comments - Intel typically has a research prototype 3-5 years before a chip goes into production. The reason they seem to release a new chip every 2 years is they have several in development at once.

Re:Better support for concurrency in Languages by Bill+Barth · 2007-04-25 01:18 · Score: 1

When you can show me a distributed memory parallel weather forecasting or climate prediction code written in Haskell, i.e. something that runs and scales well on large Linux clusters, has high interprocessor communication needs (both in terms of latency and bandwidth), and does a metric assload of floating point computations, I'll start to get interested. If you don't want to go so far as to include all the physics that go into weather and/or climate, just show me a Navier-Stokes simulator that has all those properies.

--
Yes...I am a rocket scientist.

quips are next by mindserfer · 2007-04-25 01:42 · Score: 1

After trips are quips, clearly, quadrillians + instructions per second.
AI humor is unavoidable at this point.

IA64, Rosetta by Anonymous Coward · 2007-04-25 03:06 · Score: 0

There's your answer. All it has to do is run the existing code reasonably fast (as in, not too much slower than x86), and people will buy them, especially when they see hot new stuff coming out for it.

I don't think these are the real deal, but if they were -- if we did suddenly have a CPU that was ludicrously faster than our best x86 -- probably the first thing that would happen is, someone would port Linux+Qemu to it, and benchmark Windows in that vs Windows on real x86.

183 comments