Stretch Announces Chip That Rewires Itself On The Fly
tigre writes "CNET News reports on a chip startup call Stretch which produces the S5000, a RISC processor with electronically programmable hardware so that it can add to its instruction set as it deems necessary. Thus it can re-configure itself to behave like a DSP, or a (digital) ASIC, and perform the equivalent of hundreds of instructions in one cycle. Great way to bridge the gap between general-purpose computing and ASICs."
Can you imagine the virus you could write if you could change the instruction set of the cpu?
If this doesn't rempresent the death of the megahertz as a processor-benchmark standard, I don't know what will...
/. cliche, but... imagine a cluster of these!
Effective application speed was never based on a cycle count alone, because different processors can have better instruction sets for the given application. The main breakthrough here is that this chip leaves "user-definable" space in its instruction set so they can re-optimize the instruction set on the fly. Whatever you're running, its most commonly used functions can almost slide from being code to being "on the chip" and that's sure to speed up the experienced speed.
Yeah, I know its a
Give these damn chips awhile to evolve and you'll have borg nanoprobes... Beware the nanoprobes!!
And it will ship with a free copy of Duke Nukem Forever, right?
I really hate signatures, but go to my website.
Just imagine a Beowulf Clu...oh. Skynet. Right.
Let's not do this one.
cool. -One step closer to Judgement Day
... wake me up when i can buy a thousand of them for $10 a piece ...
[okay, okay, so it'll be -hell- fun to design codecs and other protocols that can switch their chipset dynamically, yeah, but i'd need 1000's of them deployed to have a real reason to do it...]
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
"I see that you are (insert processor mumbojumbo.) Would you like me to reconfigure my instruction sets?"
Of course, there is no such thing as a universal solution and the Stretch processor does have its limits. One significant area is in "low touch" operations such as network processors. While it can certainly do the relatively simple packet inspection and transformation that switch fabrics and network processors normally handle, it is really much better suited to the heavy-duty calculation- and manipulation-intensive tasks found in "high touch" applications such as video compression. For example, H.263/264 motion estimation is capable of producing very high-quality video from a relatively small bit stream, but requires lots (and lots) of raw processing horsepower. Happily, the Stretch processor is only too happy to oblige, churning out a SAD (sum-absolute difference) operation on a tile-full of pixels for H.263 video in 43 ns (H.264 takes 83 ns).
HIV Crosses Species Barrier... into Muppets
I think we're going to have to move the crypto benchmarks back a step when this tech comes out. Not very many of us have RISC chips that are optimized for MD5 or any of the other popular crypto formulas, but if the typical consumer PC had this technology, we could all effectively have an on-demand RISC for whatever we need at the moment sitting in our PCs.
In short, the time-to-crack using consumer technologies for almost any form of crypto is about to take a step backwards. It won't "break" anything, but the brute force combinations will be able to be examined in a faster time, meaning higher standards will be needed for the same level of protection you have today.
Not surprising, these breakthroughs will always keep coming...
Is this the only technology they managed to salvage from the android's severed hand? Any interesting gears and motors at all?
Don't blame Durga. I voted for Centauri.
How can something that normally takes "hundreds of thousands of instructions" be handled in a single instruction? Surely all the same mathematical operations must take place, except for some optimization. Or is it a matter of a certain structure for computation being created in a more permanent fashion rather than being dynamically formed upon demand? Then the operations could be performed in a single cycle. On the other hand, that portion of the processor would become useless to other tasks. Or am I misunderstanding this entirely?
IANAEE, but I was just wondering if this technology provides greater advantages to unique monolithic apps as opposed to apps targeted for virtual machines such as the JVM or CLR. Those VMs are general-purpose, and maybe apps that run on them would be "invisible" to the hardware reprogrammability... however I don't know how just-in-time native compilation might change that picture. Anyone with knowledge of this stuff care to enlighten?
Read my keyboard review.
It's called DISC, Dynamically Reconfigurable-Set Computer. It's existed for a few years now. If I remember correctly, there is a group at Berkley working in the area and have released a few nice papers on it.
I remember a project where hardware engineers setup a cpu to modify itself until it learned to do a task by itself. It got to the point where the hardware was doing the right thing, but not because the hardware was reconfigured properly, but because the software was using minute naunances in the electricity flowing through to get the job done. Even the hardware designers had no idea how it could possible be working
Cue Skynet jokes...GO!
Sooooo this T800 model Terminator walks into a bar with a poodle under on arm and a basketball under the other...
...I sense another Transmeta coming on...
Yes sure, rewirable chips would be cool for certain applications, but how does one go about making it deal with multiple applications with multiple needs? You'd over load the CPU with a truckload of specialized instructions - which would probably slow it down. Granted, I see uses in things like mobile phones, but for multitasking machines, a 'Jack of all trades' chip is the way to go.
From what I gathered, this allows the compiler to create an instruction that can do a lot of work in one instruction, NOT for the processor to decide to create an instruction. Think of it this way, if you know you need to do something like an array multiply many times, the compiler could create an instruction for it, and then use it as needed. The key to this is that the instruction set can be optimized on a program basis, so you don't waste gates on SSE2 instructions if you don't use them, etc.
2 37&mode=nested&tid=126)
This would compare with FPGA's I believe in that most FPGA applications are fixed once loaded, although I know that there was talk about stargate systems on slashdot (http://slashdot.org/article.pl?sid=03/02/15/1629
using FPGA's for general processing before.
For the most part, FPGA's you build its code from scratch, you give it it's identity of how it works, what it does and so on.
This chip sounds like a hybrid between an FPGA and a run of the mill general purpose RISC processor. Being based on a RISC instruction set, you code for it as you would a normal processor, however if the compiler sees code which could take advantage of having more CPU support, it could add instructions to the FPGA like portion of the chip to enable better throughput.
The short summery is: FPGA, programmed from scratch. Standard RISC processor: Already has instruction set which you program against.
This could be quite handy for some of the embedded programming I do.
Help Brendan pay off his student loans
This is evolutionary, not revolutionary. Many chipmakers have offered microcontrollers and microprocessors with FPGA on chip. Often it is an extension of the I/O built into the processor, so it's not much different than an external FPGA on the processor bus. Please note that this is NOT like processors that run on the FPGA itself - these are seperate from the FPGA portion of the chip.
Stretch is different in a few ways:
It pulls the FPGA closer to the core, so that it can be utilized almost as part of the pipeline. I say almost because of the following statement in the article:
Inside the chip, the ISEF is coupled to the rest of the circuit by 128-bit buses and has 32 128-bit registers. It runs in parallel with other areas of the processor, effectively becoming a fully reconfigurable co-processor, and can be reprogrammed for new instructions at any time during operation.
So it's still fairly seperate from the processor core.
But the core itself is high performance (fast clock, a little faster than the average FPGA) and it has a very fast memory bus (again faster than the average FPGA)
The downsides are likely to be:
1) Power cost and dissipation. Since it's a slow clock, the dissipation probably won't be bad, but it's not going into a small portable machine.
2) Time to reconfigure. This isn't meant to be a general processor with task switching. Context and task switching is going to be expensive and if you plan on running two concurrent tasks which both require special instructions the entire processor will likely perform, on average, much worse than it would without the reconfigurable portion. Unless, of course, the processes were created to use the same set of special instructions so the context switch isn't more expesnsive than it is for today's processors.
So they are targetting it correctly, it seems. Specialized areas with, in general, only one task/program running at a time. Multimedia players, for example, would be great here. A digital recorder/player would work well if both the encoding and decoding portions of the code were compiled so the special instructions created wouldn't have to be changed for either application to allow playback while recording.
-Adam
That's already here. It's called "C".
Ita erat quando hic adveni.
This sounds vaguely like the dream solution for developers. The article says:
Does that mean it can handly booting multiple OSes simutaniously? If so, how long before someone writes an app that bridges multiple OSes, allowing the equivalent of emulation, without the emulation? I don't know about the rest of you, but the potential of this chip sounds like a dream come true. And at $35-$100 per chip... it's cheaper than the processor for most systems anyway.
The first processor that can add to its instruction set while operating? I think there were a few microprogrammed processors in the 70s/80s with writable control store that could do exactly that. Anybody remember PERQ workstations? Now this new gadget appears to be able to extend itself by means of an embedded FPGA, instead of plain old microcode, so it's a bit like the Xilinx Virtex II PRO series (PowerPC core with big FPGA on one chip). The really innovative thing is that you don't have to program the FPGA in VHDL or Verilog, but the C++ compiler takes care of that.
This is basically an FPGA married to a RISC processor. So if you have a bit of RISC code that can be simulated using the FPGA portion, and you have enough spare cells to add it, and it takes 10 clock cycles for the FPGA "user instruction" to dispatch, but it takes 200 to do it outright in the original RISC instructions, then you're experiencing a 20 to 1 speed increase for that bit. You speed up the function without overclocking. Actually what you've done is "trade off".
He could have posted clearer, if he wasn't trying for first post.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
One of the best applications for this chip is a programmable Graphics card.
Imagine the optimizations that you could do for the next release of the Doom engine. They could own the market for GPUs that optimizes itself for specific games. Could be amazing.
Sunny
Be my Friend
There is a much, much better article with lots more detail on EETimes.com.
I can just see this processor, mixed with a bit of Mark Tildens analog AI research to really advance Artificial Intelligence. For the uninitiated Mark Tilden discovered that by tying a group of only four or so transistors and sending a regular analog signal through it he could get small robots to walk, and indeed do an amazing number of things, including optimize it's path and even remember it's solution for a small amount of time(about 3 or 4 seconds). Not only that but when given a certain stimulus need (example make them solar powered and have only one are of light they would compete with other bots to gain access to better light. Indeed a lot of the behavior that these little bots produce is so complex and life like that he has spent a long time just documenting behaviour. Now give a set of these bot's circuits the ability to "optimize" the speed of the signal, and a few stimuly and let it play. If the stimulous was for "human approval" some input from a human indicating good or bad.... Heck what do I know, I'm non AI researcher but it always sounded cool to me :-)
For more information on Mark Tilden go to
BEAM Online
This is not a sig
I tried to do something like this once, but I kept running into the problem of differential voltages in the pulse-modulated ion core.
Ahh - that's easy. You should have routed the ion core voltages through a phase discriminator; would have cleared that right up.
I think they must have shunted the positrons through the floating point pathways
No, that would have caused a cascade failure in the deflector array.
FPGA in this context means Field Programmable Gate Array.
I highly doubt anyone is planning on making PCs with these. They are designed for being a processor in something like a data logging / control system, surveillance video compression, etc. Your system will probably have no need for virus detection any more specific than other more general regression and test suites it will need during operation.
This will be useful in places that they mentioned. Places where you do a lot of processing that takes many generic instructions but can be translated into a single string of descrete instuctions.
The more I think about it, this is the direction processors are going. We keep moving processors towards RISC based cores. We keep adding specialized paths for things such as multimedia. Eventually we WILL have half the processor being a purely RISC core and half being programmable hardware for specialized computational intensive instructions. I retract my initial view.
I do wonder though, what the life is on the hardware side. How many times can you reprogram the hardware before it starts to die. What is the error rate in reprogramming it? What happens when a few programmable transistors die?
I do security
Short answer: FPGAs let you build using basic gates and (very small) lookup tables. This lets you build anything you please, and fully optimize the number of functional units of each type that you have, but has a speed and size penalty.
This chip is basically a RISC processor with an FPGA-type fabric bolted on as a co-processor, as far as I can tell from the detail-poor press release. By implementing most of the instruction pipeline as fixed, optimized hardware, it runs without any of the penalties of a purely FPGA-based implementation. When you have a number-crunching task that would benefit from a custom logic implementation enough to offset the performance penalty of implementing it in programmable logic blocks, the compiler configures the programmable logic into a suitable coprocessor which is stuck in as an extra branch of the instruction pipeline.
How much benefit you get from this depends on what you're doing. Modern general-purpose microprocessors have enough vector instructions to handle most DSP-ish tasks without an abysmal speed penalty (just a large size and power penalty over a purely DSP-based implementation). Most computing tasks aren't limited by processing horsepower at all - they're either waiting for memory accesses to complete (even cache accesses are very slow compared to register accesses), or they're waiting for the target address of a branch to be decided (speculation and BTBs don't address this perfectly by a long shot). A reconfigurable processor would suffer from much the same type of problem. While using the programmable logic path for slice processing could remove some of the branching penalties (by following all paths and selecting the desired result), this would be at an even greater area and power cost.
For specialized applications, it would be quite useful, of course.
A quick glossary of terms being thrown around, for anyone confused:
This is a combination of lookup tables, sum-of-products combinational logic blocks, and scratch-pad SRAM that you can hook up in nearly arbitrary ways to produce custom circuits at a gate level. Bulky and slow, but good at implementing algorithms efficiently. Configuration information is loaded from a serial PROM chip at startup, letting you change it relatively easily.
Like an FPGA, but stores configuration information internally, so you need to take out the CPLD and burn it to change configuration instead of re-burning the configuration PROM.
Little cousin to CPLD. This is what you played with in second or third year. Typically these are just a sum-of-products combinational logic block with a register stuck on the end to latch the output. Useful as glue logic.
This is an integrated circuit that's half-made. A number of gates and registers and so forth have been fabricated on the chip, and the lowest few metal layers have been used for internal routing for these, but you get to define the upper metal layers to form arbitrary connections among these (either as the last fabrication step, or by laser-cutting a pre-fabricated wiring mesh to leave the geometry you want). Works much like a CPLD, but the design is decided at fabrication time and cannot be changed. Faster and less bulky than a CPLD implementation.
This is a custom-fabricated integrated circuit that uses cells from a standard library of components, usually automatically placed and routed from a VHDL or Verilog description of what you want the chip to do. Faster than an ASIC if you have good place and route software, but more expensive in small quantities because you're making what amounts to a full custom chip. Design time is much less than a fully custom design would be, though (but verifying that the design description is correct is a royal pain).
I hope this clears things up for anyone who was confused.
Sounds like this would be a perfect processor for emulating consoles such as the SNES, XBOX, GameCube, PS2, etc etc or pretty much any other processor.