Next Generation Chip Research
Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."
Is it reliable?
Wanted: Clever sig, top $ paid, all offers considered.
apprently, one of the pressing problems that chip designers are facing is coming up with stupid, meaningless acronyms.
It doesn't actually look any different. 128 instruction per "block" executed in parallel, just like a superscalar processor. This has been around since the time of the Pentiums (The pentiums weren't VLIW, though). What exactly is new?
There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.
All the apps keep tripping up...
/* FUCK - The F-word is here so that you can grep for it */
OMGWTFBBQ it's the University of Texas at Austin!
The article states that this works by sending blocks of up to 128 instructions at a time to the processor, where "The processor "sees" and executes a block all at once, as if it were a single instruction..." Makes you wonder if they'd ever get close to that target, as IIRC, one instruction in seven on average is a conditional branch.
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
can i cook eggs with my heatsink?
We can understand easily how a loop could be calculated as a function, if the contents of the loop block is composed solely of calculations. When this occurs, the output of the loop is simply a function of its input (f(x), if you will). However, computer scientists who think that programs can always be reduced to a simple function with given inputs have their heads too far in their books to see how the real world forces programs to be far removed from that ivory tower gobbledygook.
In the real world, you aren't typically performing calculations in loops. Rather, you are usually reading and writing to memory, which may or may not be cached. So it isn't just a matter of saying f(x), it is much more complicated and possibly dependent on memory which you have no way to determine until the loop iteration reaches that point. And then you'll still get the bottlenecks which plague us today. Memory isn't fast enough, devices aren't fast enough, too much time is spent waiting for I/O to complete.
Pushing as much brute-force computation off onto compilers is fine. Let them unroll loops and optimize functions. But what are the limits to this? Can we really optimize our way to 1-step loops? I don't think so, but the DOD seems to think it is possible.
Jesus saved me from my past. He can save you as well.
Overal they might make some things marginally more efficient, but they aren't solving any fundamental problems. They're simply moving some around slightly.
I seem to remember that Intel designed Merced (now the Itanium, known colloquially as the Itanic to reflect how well it's gone in the marketplace) to shift the burden of branch prediction and parallelism to the compiler. Or, in other words, the compiler was expected to mark instructions that were capable of running in parallel, and also to state which branches were likely to be taken.
All a great idea in theory; after all, the compiler should be able to figure out a fair amount of this information just by looking at the flow of data through the instructions (although it may not be so good at branch prediction; I'm not sufficiently strong on compiler theory and branch prediction to talk about that.) However, as can be seen by Itanium's (lack of) market success, the compiler technology just isn't there (or maybe we're using the wrong languages; there are, after all, languages that are designed to be inherently parallel.)
If this team can get it working the way they want to, maybe -- just maybe -- Itanium will find its niche after all. But let's not kid ourselves; this is a hard problem, and it's more likely that they'll make incremental improvements to the knowledge that's out there, rather than a major breakthrough.
This sounds really cool.
" What exactly is new?"
A snazzy acronym.
I don't get a word he says, and I know a little bit about programming. Can somone dumb this down?
From what I know, a loop is a loop and you need to satisfy a condition and do some processing. Won't it be a problem if I don't have the data resulting from the last loop before I do the next one?
...I am proof that intelligent beings are not always intelligent...
Bugs on the chip can lead to bad Trips
So Long and Thanks for All the Fish!
Their NEXT next generations chips will be powered entirely by buzzwords and acronyms.
http://www.cs.utexas.edu/users/cart/trips/
> their code for parallel processing, and that's difficult or impossible for some applications.
>
> "The industry is running into a programmability wall, passing the buck to software and hoping the programmer
> will be able to write codes for their systems," he says.
So you want the programmer to be unaware of the parallel processing. Then the article goes off and says something stupid IMHO.
> a huge amount of control logic, control transistors that don't do any work -- they just consume power. Trips is trying to push some of that complexity back up into the compilerI thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions. As a hobbyist compiler dev, I'd like to note that the data flow architecture is the basis of all compiler optimizers (DAG), though the typical compiler dev is likely to use this input to allocate registers to minimize pipeline stalls. I admit that it can be done at the CPU level to some extent - then this is even stranger.
> Trips compiler sends executable code to the hardware in blocks of up to 128 instructions. The processor "sees" and executes a block all at once, as if it were a single instruction, greatly decreasing the overhead associatedSomehow this just shifts the hard work of peephole optimisation to the CPU to be done at real time. It would have been far better to do it in the compiler properly - something which needs extra memory and lots more processing than the code that is being executed.
All in all, I don't see this thing revolutionizing General purpose programming systems. Though what I call special purpose programming might be the way the future of programming might go - I'm no Gordon Moore.Quidquid latine dictum sit, altum videtur
Does any major piece of software that folks use come from UT?
...
I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell
But I can't think of a single one from UT. Not a single one. Is there something we all use that comes from UT?
I know they have good petroleum engineering at A&M -- but I'm interested in CS.
http://www.thebricktestament.com/the_law/when_to_
Does it runs windows ?
No joke, I'm sure some asshat will eventually ask that one seriously.
This looks to me to be a combination of old and not so good idéas.
I have read about out of order execution and using data when ready at least 5 years ago in Hennesy and Pattersons book "Computer Architecture A Quantitative Approach". To me it sounds like a typical scoreboarding architecture.
And how he can claim that this will lead to less control logic someone else might be able to explain to me.
As for executing two instruction at once since their destination and value are the same sounds like a operation that will lead to more control logic. Besides doesnt most compilers optimize away these kinds of cases?
"This message was brought to you by Sarcasm and Troll Feeders United (or STFU, for you un-hip people)."
IS it just me, or does this approach sound very similar to VLIW (http://en.wikipedia.org/wiki/VLIW) architecture. The problem is that the branch prediction needs to be very accurate, for any kind of performance boost.
Which is why these types of architecture lend very well to sequences of operations that are very similar (video processing, etc.).
Will this work just as well in the general-computing sphere? No idea.
What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple. This is for doing stream data processing such as compression or HPC simulations. I seem to remember in their presentation showing a prototype doing software-radio at a data rate usable for 802.11.
dependent, dependent, dependent
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Brooklyn, the FAMOUS anonymous poster writes:
:)
"Teraop Reliable Intelligently Adaptive Processing System"
That's TRIAPS, not TRIPS.
>Does it runs windows ?
Nope, but it runs on Linux tough!
ROTFL
No Joke, but you could:
1. Volunteer yourself,
2. Buy this Titanic II TRIPS chip,
3. Port GCC to it,
4. Compile Linux,
5. Be an hero!
???
6. Sorry, No profit.
Trips ... check ... check
Texas
Yup, We all know now they're producing bioweapons... Stephen King is a goddamn prophet.
Seriously, "The stand" is a good book, but not realy great (like "it").
The article alludes to executing large numbers of executions simultainiously. Like creating new pathways in the brain that make certain modes of thought more efficient. If it works the shortcuts will avoid many program loops that would normally take processing time and make the trip shorter.
I suppose the whole thing will have to be ACID compliant;)
I wonder how long it's going to take these innovations to catch on in mainstream computing? Given that most desktops are still running on architectures burdened by 30-year-old design practices... I'd just like to see RISC finally embraced to the degree it deserves. That alone would certainly open up a lot of innovative designs that aren't feasible with the x86.
Skype is too convoluted... Now I'm reverse-engineering the Kyoto Protocol.
I had an interesting discussion with a chip designer the other day. We were talking about parallel processing, and I spouted the usual perceived wisdom "But isn't the problem with parallel processing that many problems are very difficult or impossible to do in parallel? And isn't programming in parallel really difficult?"
I found his answer very interesting, something like "that line of thinking comes from when computers weren't fast enough to do the basic things we wanted to do with them to do then. It's true, an application like a word processor is not a good problem to tackle with parallel processing - but we don't need to these days. Nearly all the stuff we want to do today - faster graphics, 3D video image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going - all of these problems are idea for parallel processing. What Google does - that's essentially parallel processing, isn't it?"
That kind of changed my perception of things and made me realise my mindset was way out of date.
while I'm reading TFA could someone explain why branch "prediction" is such a big sticking point in CPU architecture... surely a processor has the compiled code and a bunch of data, it doesn't need to predict anything because it's all laid out. and by that i mean "for... if... break;" processor shouldn't be surprised when it gets to that nested if and reaches the break and has to jump out of the loop cos it's clearly there in the code to start with, it's not like it just magically showed up, is it.
If you don't risk failure you don't risk success.
The original announcement came in 2003:
http://www.utexas.edu/opa/news/03newsreleases/nr_It seems to me any serious research into microprocessors will be hampered by the fact that it will be completely inapplicable unless it dumbs itself down to ape the x86 instruction set. All current and future processor design advances will be defined as better and faster ways of making modern silicon pretend it's a member of a chip family that was obsolete when the first President Bush was in office. That's not progress. That's just kind of sad.
Heaven help any researcher if implementing their new chip design requires a new software paradigm that doesn't fit neatly into the OS/Application model, too. We're living in the perpetual now of 2000, and it's some boring shit. I want my future back.
Bah.
SoupIsGood Food
This is not some boring super scaler! Nor is it some vector processor!
in fact this is a complete departure from a von Neuman architecture. The architecture is called a Dataflow architecture. In one sentence a dataflow architecture is one where instruction execution is based on the availability of the instructions inputs not a program counter.
The article does a very bad job at conveying the fact that this is a relatively new idea. Like most reporting they report something thats been in research for some time as a huge breakthrough without describing it at all. Instead its really just an incremental step in dataflow computing research.
I work in a lab at the University of Washington on another dataflow architecture. Its a really interesting idea but it will take some time to develop and you're not going to get one on your desk for some years to come.
Is the guy who runs this machine named Captain Trips?
Is it just me or does the article explain '95 technology?
It tells about loading blocks of instructions at a time (say, a cache line), then executing them whenever the data is available (which is called out-of-order execution).
In other words, they're going to overclock a pentium-I to 10ghz and add an excess in pipelines to make it reach a teraflop. I could've done that (given the p1 design).
Actually, it sounds more like an FPGA. And, since VHDL is turing-equivalent, it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.
the homepage for the TRIPS project: http://www.cs.utexas.edu/users/cart/trips/ because the article doesn't do a good job at explaining the idea, which I think is very interesting. It's not mere branch prediction these people are talking about, and it's more than dumb parallel processing. They are basically fragmenting programs into small dataflow networks.
assignment != equality != identity
Pure functional programming languages will see a tremendous boost from architectures like Trips. In functional programming languages, variables are never assigned, thus making it possible for all parts of an expression to be executed simultaneously. With 128 instructions, it is possible that lots of algorithms that take lots of time when executed sequentially, will take constant time with this new architecture: matrix operations, quicksort, etc.
So why is this on slashdot?
First, I didn't RTFA yet, but I'm just wondering aloud...
So, eventually, would it be possible to build an 802.11[abg] software-defined radio that can sniff (using Kismet of course) all 11 802.11[bg] channels and all however-many-there-are 802.11a channels, all at once? You could scan for all AP's in about 1/10 of a second! If software-defined radio hardware (e.g. receivers, DSP chips, maybe even this chip) become cheap, will we have AP's that communicate on 11 channels at once? (Yeah I know they have MIMO but that's only 3 channels, IIRC, and it doesn't use an uber-l33t software defined radio chip!)
ttuttle is a rankmaniac
I wonder if this would handle concurrent programming and constraint-based inference Mozart far better than existing chip architectures.
When the people fear their government, there is tyranny; when the government fears the people, there is liberty.
Actually, it sounds more like an FPGA.
Err, FPGA and DSP describe different things. AAMOF, their DSP could be implemented using a FPGA, so your statement is a bit confusing. DSP merely describes the functionality of a part, FPGA is a specific type of part.
All you need to know is that there are some people in this world called compiler weeinies. They had over-protective mothers and spend their lives looking for the same kind of protection in the adult world. They worship compilers because compilers protect them from the scary realities of computer architecture and keep them nice and safe in the world of high-level languages.
Compiler weenies believe, above all else, that a good compiler can always do a better job than any human at optimisation. They are in denial of the following facts:
- Languages usually do not specify important info that may be used for optimisation, eg typical values of inputs
- Algorithm cannot be separated from architecture. A vector machine might do DFTs faster than FFTs. No compiler can turn FFT into DFT.
The article doesn't seem to agree:
So, it looks like they're trying to get Intel or AMD interested in producing a heterogeneous multi-core unit that includes their trippy core, in the hopes of keeping the number of cores (and their communications overhead) down to a minimum. Intel already has a form of (so-called) instruction-level parallelism with the Itanic, and it didn't work out too well (except maybe for crypto-heavy workloads). It's possible AMD will be mulling it over. One of the things they will have to worry about is whether a compiler can actually be written to use it, FTA:
With 128 instructions to schedule at once, that might provide a chance to actually keep all of the processing units on the chip busy. With the Itanic, it was really a challenge to do that, since you had to pull two floating point instructions out from somewhere in every clock cycle, something that not all workloads could accomplish, and I can see the compiler writers going crazy trying to produce some sorts of ultimately self-defeating hacks trying get that accomplished :)
So why aren't dataflow machines mainstream?
The reason is that dataflow is really a non-algorithmic, signal-based approach to computing whereas most programming languages and applications are strictly algorithmic. We need to change our way of programming in a radical way before the non-algorithmic model can take off. It's not easy to translate algorithmic code into a dataflow application.
In my opinion, even though TRIPS has 'reliability' in its acronym, unless the execution of parallel code in a given object is synchronous, there is no way it can enforce reliability. To get an idea as to how a signal-based synchronous architecture can enforce software reliability, see the link below.
There was a somewhat famous CS person named Djikstra who taught there for years. Perhaps you've heard of him.
He set the tone for UT's best known research for years - theory. They've also got a couple of well known robotics labs (not as well funded as CMU, but they're more focused on improving the software brains than building big flashy machines to crash around in a desert)
Beyond CS undergrad (which is UT's second largest major, behind Biology - and UT is the highest populated university in the USA), UT's got a good grad program.
I had a class with Doug Berger. Great guy. Brilliant, too.
Just because you havent heard of something doesnt mean it doesnt exist. Most work universities do doesn't get published on slashdot - it goes into research journals and conferences that I'm sure you don't read or attend.
-
you just have a lack of knowledge about it.
the UT applied research lab has developed the basis technology behind pretty much every US military sonar system in use since WWII. Ditto with a number of satellite and other techs (mostly defense related, but all that trickles down into mainstream usage). ARL is a combination of CS, ME, EE and other engineering fields.
Numerous search engine technologies and the closely related 'recommendation' systems that places like amazon uses have been born and bred...
UT does mostly foundational software and research work, which is acquired and built upon by others.
-
TRIPS definitely doesn't look like it was targeted for a desktop, more for DSP like apps requiring high throughput and a constant data input stream. They mention this in the article, (Software defined radio, co-processors for actual general purpose processors). So an architecture like this may be competitive w/ something like Cell or Imagine or maybe TI's high end dsps, but not with a single core processor targeted for business apps and the like on x86 platforms. And part of the reason why TRIPS is a good design is that the compiler guys and the hardware guys are part of the same group and probably sat down and hammered out an ISA that allows for maximum extraction of parallelism by the compiler. Btw the main reason why we're still using x86 is economics. It would require not just a better design to get companies to suddenly move on and abandon their legacy stuff, it would require something revolutionary with insanely good marketing. The drift to RISC type ISAs is happening... just very very slowly (I believe both AMD and intel convert x86 CISC type instructions into RISC-like uOPs which are then executed no?)
It seems to me that they only took superscalar architecture description from Hennessy & Patterson, description of Tomasulo algorithm, and added bit more of everything . And yes, they hope that they will have enough instruction level parallelism to utilize all functional units they have.
Does any major piece of software that folks use come from UT? I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell
National Instruments, of Austin, TX, sells a graphical programming language, called LabVIEW, which has about a 90% market share in the research sector [both for-profit and "not-for-profit"], and which is moving aggressively into the automation sector [i.e. the factory floor].
PS: Ironically, LabVIEW 8.0 was just announced yesterday.
PPS: Unlike many of their competitors [e.g. Agilent], National Instruments weathered the dot-com/dot-bomb tech debacle pretty well:
Unlike many of their competitors [e.g. Agilent], National Instruments weathered the dot-com/dot-bomb tech debacle pretty well...
Here's a better graphic of what I was talking about:
Or this:For all those who program (except maybe for those who work in assembly) this will not impact you directly for a while. This is at the machine level and it's a different kind of architecture. It may need new complilers or even new languages. But, it's not there yet. Just because you program everyday doesn't mean this will apply to you. This is mainly for the folks are interested in more than just how to make loops in code.
So, if all you do is program computers, wait for the trickle down before claiming it's good or bad...
AB HOC POSSUM VIDERE DOMUM TUUM
If each instruction executes when its inputs are available, rather than in any specified order, and passes outputs to the next instruction, rather than to a specific register, it seems like such a system would be best for function programming. Is there any truth to that?
But often the ideas don't pan out in real life. With TRIPS, you get inflated IPC results from inflated instruction counts from huge superblock schedules. The TRIPS compiler (last I saw) was not suitable for real life applications. The fact that they can fab things like TRIPS chips and boards only shows that we have so many transistors on a chip these days, any crazy-ass idea you have can be produced.
With modern out-of-order pipelines, you get instruction issues in dataflow order. You have to be very careful when trying to encode dataflow in the instruction set. If you obey it, you can have problems when instructions don't have the latency you expect (like cache misses).
If you want my opinion on where the interesting ideas that will get used in the future are, look at what people are going to do with PS3/Cell. Look at languages and operating environments that facilitate parallel structuring of code. Once you get the program into independent threads of execution with low synchronization, the amount of independant instructions you can present to the machine skyrockets.
I dunno, maybe I'm too critical. The guys at UT Austin are doing interesting things, and getting interesting results. If nobody was doing research, nobody would come up with the clever ideas we need in the industry. Work on the compiler, especially, has usefulness outside of a TRIPS ALU grid. I just think this "grid of ALUs" idea is getting old, and since UT and UW like these ideas, it's most of what you see in ISCA and MICRO.
On the other hand, I guess if you have a billion transistors, why not throw a big grid of ALUs in there?
I guess we'll see what pans out in the industry in the upcoming years. I'd place my money on more threads/CPUs, and not so much in the "sea of ALUs" approach. But I know the company I left last year was thinking of this kind of idea in a real product. If you can make it work for real, you can make some bucks. If not... well you can publish some papers I guess. :-)
-- Erich
Slashdot reader since 1997
If you have a chance to take his Microprocessor Architecture class, do so. He rocks! We had people from the other professors' sections sneaking into ours to actually learn the material.
The TRIPS homepage has nine published papers on how this design will work and a schematic diagram of what they're expecting the design to end up looking like. They are also promising simulators and compilers later this year.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Wrong. It is entirely general purpose. Although the prototype won't boot Linux -- the prototype TRIPS boards will run under the control of an embedded Linux on a PPC machine -- that is only being done to simplify the design of the prototype chip and so no OS has to be ported to it to make the thing work.
... Captain Trips, my first thought as well.
RIP Jerome John Garcia 1942-1995
Thanks for the Memories.
hardly.
superscaler architectures are dynamic placement, dynamic issue: making it the hardware's responsibility to figure out both WHERE and WHEN to fire an instruction. this carries tremendous control and logic overhead.
TRIPS is a static placement, dynamic issue architecture: thus the compiler (or assembly language programmer) decides WHERE (IE: which ALU) to place an instruction, and the instructions fire dynamically - in this case, when all of it's inputs have arived.
There have been many research machines, but no successful commercial products. Data flow techniques seem have had their greatest impact in two areas: compiler optimization and instruction scheduling inside the CPU. Many optimizations use SSA, or static single assignment. SSA means that any variable is only assigned a value once. Converting to SSA means that the code can be represented as a Directed Acyclic Graph (DAG), and this is useful for code generation. Dataflow is also implemented in hardware to enable parallelism and features like speculative execution in the pipeline.
Experience has shown that there is only so much parallelism that can usefully be exploited using either compiler or hardware based dataflow based techniques. This is not a good sign for this project, unless they are targeting primarily very parallel applications, for example DSP algorithms or image processing. Even so, other research groups have tried this and failed (or at least not succeeded). One is the RAW architecture at MIT: http://cag-www.lcs.mit.edu/raw/ Another example is iWarp, a CMU/Intel systolic processor. RAW is currently active, iWarp is over.
make that 780 million ARM processors, with 80% of cell phones using them. I belive they're on their way to 1 billion this year.
In communist russia, CPU uses Users to compute in parrallel.
Oh look! Another asshat attempting to googlebomb someone via their sig. /. adds 'rel="nofollow"' to links in .sigs, making your feeble attempt completely pointless.
<RedForman>DUMBASS!</RedForman>
--
Anonymous Coward - Educating dumbass Slashdotters since 2005.
it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.
That fact a lot like the the fact that Saturn would float if you dropped it in the ocean. Both are technically true (Saturn is mostly hydrogen and helium and really is less dense than water), but Linux will no more fit in any existing FPGA than Saturn will fit in any existing ocean. Chuckle.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
If you compile Gentoo Linux with full space optimizations, you can fit it onto a four-function solar-powered calculator, with room left over for Tetris and Minesweeper.
See also this post.
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana