Stretch Announces Chip That Rewires Itself On The Fly

virus hitting the hardware by KDN · 2004-04-26 07:09 · Score: 5, Insightful

Can you imagine the virus you could write if you could change the instruction set of the cpu?

Re:virus hitting the hardware by Neil+Blender · 2004-04-26 07:14 · Score: 2, Funny

Can you imagine the virus you could write if you could change the instruction set of the cpu?

Uh, no.
Re:virus hitting the hardware by NanoGator · 2004-04-26 07:21 · Score: 4, Interesting

"Can you imagine the virus you could write if you could change the instruction set of the cpu?"

Forgive my ignorance, but why would this be any different than the virus you can write with the general purpose CPUs we have today? You could make the machine unreliable, but that wouldn't make for an effective virus distributing machine.

--
"Derp de derp."
Re:virus hitting the hardware by pmiller396 · 2004-04-26 07:42 · Score: 3, Funny

> You could make the machine unreliable, but that wouldn't make for an effective virus distributing machine.

10,000,000 Windows machines can't be wrong!
Re:virus hitting the hardware by Short+Circuit · 2004-04-26 07:45 · Score: 4, Insightful

Interesting point.

People developing along similar lines must have means of controlling the new circuitry so that hot spots don't form on the die. Especially if they provide analog capability. It could be too easy to set up a feedback that could really trash that part of the die.

Which brings up another thought: Do they have an on-board controller that tracks what parts of the die are usable and what aren't? If they do, they can have seriously high production yields.

In fact, I wouldn't be surprised if such a self-diagnostic utility made its way into modular dies with specialized circuitry. So a processor could run on two AMUs instead of three, and so forth.

--
tasks(723) drafts(105) languages(484) examples(29106)
Re:virus hitting the hardware by AmericanInKiev · 2004-04-26 08:59 · Score: 2, Insightful

I wouldn't bet on that.

A Minor change in the instruction set would likely render the OS dysfunctional - and while that would certainly get attention - it would not propogate very well.

There is a math about viruses which requires them not to kill their hosts, and to do as little damage really as they can bear. Damaging viruses get high priority on fix lists and would get shut down more quickly than less harmful viruses.

I think a CPU change virus would be a rather self-defeating proposition.

New application-speed records to be set... by LostCluster · 2004-04-26 07:10 · Score: 4, Insightful

If this doesn't rempresent the death of the megahertz as a processor-benchmark standard, I don't know what will...

Effective application speed was never based on a cycle count alone, because different processors can have better instruction sets for the given application. The main breakthrough here is that this chip leaves "user-definable" space in its instruction set so they can re-optimize the instruction set on the fly. Whatever you're running, its most commonly used functions can almost slide from being code to being "on the chip" and that's sure to speed up the experienced speed.

Yeah, I know its a /. cliche, but... imagine a cluster of these!

Re:New application-speed records to be set... by Stripe7 · 2004-04-26 07:25 · Score: 4, Interesting

This looks interesting, at this generation it looks to be dedicated applications. You code for your particular application and use their compiler which restructures the CPU to optimize for that application. What it does not say is if the hardware changes are read/write. If you release a maintenance patch to your application, do you have to swap in a new CPU for optimal performance? If the area is read/write just how many times can you change the CPU instruction set? Can you change the CPU instruction set with something else other than using their compiler? That is using a microcode release that rewrites the CPU. I would not want to load a compiler onto every one of my products.

Beware! by spudthepotatofreak · 2004-04-26 07:10 · Score: 5, Funny

Give these damn chips awhile to evolve and you'll have borg nanoprobes... Beware the nanoprobes!!

Sure it will.... by WebMasterJoe · 2004-04-26 07:10 · Score: 5, Funny

And it will ship with a free copy of Duke Nukem Forever, right?

--
I really hate signatures, but go to my website.

so does that mean... by hatrisc · 2004-04-26 07:11 · Score: 2, Insightful

we can have only one standard assembly language? the hell with java if that's the case.

--
I write code.

Re:so does that mean... by tuffy · 2004-04-26 07:24 · Score: 5, Informative

we can have only one standard assembly language?

That's already here. It's called "C".

--
Ita erat quando hic adveni.
Re:so does that mean... by sketerpot · 2004-04-26 09:03 · Score: 2, Informative

There's a cool library called GNU Lightning which will generate machine code at runtime, which is good for JITs and such. It isn't exactly what you're looking for, but it illustrates that having a standard assembly language (or, much more likely, several standard assembly languages!) isn't all that far off.

Whoa.. by Anonymous Coward · 2004-04-26 07:11 · Score: 5, Funny

Just imagine a Beowulf Clu...oh. Skynet. Right.

Let's not do this one.

One word . . . by Revolution+9 · 2004-04-26 07:11 · Score: 3, Funny

cool. -One step closer to Judgement Day

yawn ... by torpor · 2004-04-26 07:12 · Score: 4, Insightful

... wake me up when i can buy a thousand of them for $10 a piece ...

[okay, okay, so it'll be -hell- fun to design codecs and other protocols that can switch their chipset dynamically, yeah, but i'd need 1000's of them deployed to have a real reason to do it...]

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

So, do they have Chippy? by Neil+Blender · 2004-04-26 07:12 · Score: 5, Funny

"I see that you are (insert processor mumbojumbo.) Would you like me to reconfigure my instruction sets?"

Can someone explain? by aiken_d · 2004-04-26 07:12 · Score: 2, Interesting

How is this different from FPGA's?

Thanks
-b

--
If I wanted a sig I would have filled in that stupid box.

Re:Can someone explain? by DaHat · 2004-04-26 07:24 · Score: 4, Informative

For the most part, FPGA's you build its code from scratch, you give it it's identity of how it works, what it does and so on.

This chip sounds like a hybrid between an FPGA and a run of the mill general purpose RISC processor. Being based on a RISC instruction set, you code for it as you would a normal processor, however if the compiler sees code which could take advantage of having more CPU support, it could add instructions to the FPGA like portion of the chip to enable better throughput.

The short summery is: FPGA, programmed from scratch. Standard RISC processor: Already has instruction set which you program against.

This could be quite handy for some of the embedded programming I do.

--
Help Brendan pay off his student loans
Re:Can someone explain? by falzer · 2004-04-26 07:32 · Score: 3, Informative

FPGA in this context means Field Programmable Gate Array.
Re:Can someone explain? by pelgv · 2004-04-26 07:35 · Score: 2, Informative

FPGA stands for Field Programable Gate Array... and it is a Chip that can be Programed, and Re-Programed... The programations is a low level one... even lower than Micros... you design it for electrical connection between gates...

dunno where u got that definition...

more info by morcheeba · 2004-04-26 07:12 · Score: 5, Informative

NetworkZone has a product review with some more insight. A good quote:

...the [300 MHz] Stretch even beats the Intrinsity FastMath processor running at 2 GHz

Of course, there is no such thing as a universal solution and the Stretch processor does have its limits. One significant area is in "low touch" operations such as network processors. While it can certainly do the relatively simple packet inspection and transformation that switch fabrics and network processors normally handle, it is really much better suited to the heavy-duty calculation- and manipulation-intensive tasks found in "high touch" applications such as video compression. For example, H.263/264 motion estimation is capable of producing very high-quality video from a relatively small bit stream, but requires lots (and lots) of raw processing horsepower. Happily, the Stretch processor is only too happy to oblige, churning out a SAD (sum-absolute difference) operation on a tile-full of pixels for H.263 video in 43 ns (H.264 takes 83 ns).

--
HIV Crosses Species Barrier... into Muppets

This is a setback for crypto-land... by LostCluster · 2004-04-26 07:12 · Score: 4, Insightful

I think we're going to have to move the crypto benchmarks back a step when this tech comes out. Not very many of us have RISC chips that are optimized for MD5 or any of the other popular crypto formulas, but if the typical consumer PC had this technology, we could all effectively have an on-demand RISC for whatever we need at the moment sitting in our PCs.

In short, the time-to-crack using consumer technologies for almost any form of crypto is about to take a step backwards. It won't "break" anything, but the brute force combinations will be able to be examined in a faster time, meaning higher standards will be needed for the same level of protection you have today.

Not surprising, these breakthroughs will always keep coming...

Re:This is a setback for crypto-land... by jsac · 2004-04-26 07:22 · Score: 2, Insightful

Luckily it will also immensely speed up encryption times. So, on the whole, probably a gain for the white hats rather than the black hats.

--
"The urge to fly from modern systems, instead of moving through them to even greater, fairer things is, I think, an indi
Re:This is a setback for crypto-land... by Jerf · 2004-04-26 07:42 · Score: 3, Insightful

Along with jsac's comment (more processor power exponentially benefits encryptors, only linearly benefits crackers, on the whole more power means a win for encryptors), I'd like point out this is only a set-back for encyption in-as-much as encryptors claim that their encryption will keep your data safe for all time. Which is to say, at least for the reputable encryptors, this isn't a set back at all.

If you insist on putting words in their mouth, then yeah, you might consider it a set back. But that's your misunderstanding, not theirs. All reputable encryptors have accounted for Moore's Law in their cost/benefits tradeoffs. Since it doesn't take much encryption power before it requires computers larger then the Universe to crack it via brute force (and since "cracks" on good encryption are really typically just ways of collapsing the search space, not procedures that give immediate answers, often adding more bits will require Universe sized machines, too), this isn't that big a deal for encryption. Push your key size up and be done with it. Even conventional machines can handle that today, it just takes longer.
Re:This is a setback for crypto-land... by Jerf · 2004-04-26 10:20 · Score: 2, Interesting

We do, say, 2048-bit encryption (asymmetric), because it would be "too slow" to do 20480-bit encryption. "Too slow" here is a fuzzy term, but generally speaking, if you're sending an encrypted email you don't want to hit "send" and have it delayed for three weeks while it gets encrypted. There's no real reason we couldn't do it today.

As computers speed up, both encryption and decryption get faster. However, while adding another 128 bits to 128-bit symmetric cipher may be "free" with newer computers (and eventually will be), the 2^128 multiplicitave increase to the space the decrypters have to search is not free. To increase encryption power, the encryptors merely double their work. (To an approximation; I don't think the work load is strictly linear but it's a lot closer to that then exponential, and that's all that matters.) Meanwhile, for that relatively modest investment in encryption power, the decrypter's jobs got 340,282,366,920,938,463,463,374,607,431,768,211,45 6 times harder.

This is why, in the relatively near future, we'll all have encryption that is effectively "unbreakable", because no conceivable decrypter could be built that could do the calculations to crack the encryptions, even with the raw materials of the entire Universe.

Practically speaking, most of us already have damn-near unbreakable encryption today; if you're connecting to a computer with SSH, SSH is most likely the strongest link in your security chain by far; the weak links are the computers on each end of the link, the humans on each end of the link, and possibly the facilities the computers are in. Nobody is going to tap your ssh stream and get any value from the massive decryption effort that would be necessary unless you're trading secrets worth billions.

Specialized hardware can only gain you a linear speed up, at best, and those calculations for "minimal computer" to crack a given encryption key are not extrapolated from modern computers, they are extrapolated from the maximum computation possible to do, given a finite energy supply. (QM-based computation advocates may wait until they have a large-scale (multi-thousand-qubit) machine to jump in here.)

Anything more? by AtariAmarok · 2004-04-26 07:13 · Score: 4, Funny

Is this the only technology they managed to salvage from the android's severed hand? Any interesting gears and motors at all?

--
Don't blame Durga. I voted for Centauri.

How is it possible? by dhasenan · 2004-04-26 07:14 · Score: 5, Insightful

How can something that normally takes "hundreds of thousands of instructions" be handled in a single instruction? Surely all the same mathematical operations must take place, except for some optimization. Or is it a matter of a certain structure for computation being created in a more permanent fashion rather than being dynamically formed upon demand? Then the operations could be performed in a single cycle. On the other hand, that portion of the processor would become useless to other tasks. Or am I misunderstanding this entirely?

Re:How is it possible? by Professr3 · 2004-04-26 07:22 · Score: 3, Informative

Say you had to compute a 10000-entry sin/cos table (simple example). The processor would reconfigure itself to perform sin/cos operations in a single cycle (parallel ALUs etc.) and, if there were enough configurable circuits, perhaps multiple sin/cos table entries at once. That's where the speed advantage is - large blocks of repetitious calculations. With a sophisticated enough reprogramming AI, computationally intensive apps like video games could get a huge performance boost.
Re:How is it possible? by Chirs · 2004-04-26 07:23 · Score: 2, Informative

You hit upon the answer in the latter portion of your post. Most cpus are generalists--they're fast at most things, but aren't optimized for anything. This kind of tech allows you to optimize your cpu for a particular task.

If you have something that needs to do a simple operation on each member of a large data set, the chip could be configured as many tiny simple cores that are just smart enough to do that operation.

Or if you needed to do a complicated math function, you could optimize the cpu for that function.

Of course, it takes a certain amount of time to do the reconfiguration, so it may only pay off for many repetitions or very complex calculations.
Re:How is it possible? by radish · 2004-04-26 07:26 · Score: 2, Informative

I studied "Custom Computing" as it was called at my university a few years ago. That was based around using FPGAs as the processor, but with the same idea of doing on-the-fly redesign of your hardware to suit the current problem.

The basic idea is to move problems from the time space (i.e. do X then Y then Z taking T time to do it) to the physical space (i.e. do X next to Y next to Z taking S transistors to do so, but only one cycle). So your simple add operation in a regular microprocessor, which fetches the data and runs them through a generic arithmetic unit before putting the result back somewhere would instead have the load, add and store circuitry "hard coded" in actual transistors.

It takes some serious mental acrobatics for a programmer like me, which probably led to my not-so-stellar performance in that class ;) But it sure is interesting.

--
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Re:How is it possible? by the+morgawr · 2004-04-26 07:27 · Score: 2, Informative

It's a DSP/RISC processor (basically the same thing) with an on-chip FPGA. If you have some particular algorithm, you can put it on the FPGA to get a solution instead of having to use code. (this is a lot harder to explain then I thought it would be....)

--
The policy of the United States is worse than bad---it is insane. -- Ludwig von Mises, Economic Policy(1959)
Re:How is it possible? by Zordak · 2004-04-26 09:08 · Score: 2, Informative

There is the analysis required to even determine that the incoming instructions require sin/cos. Then there has to be a lookup into a rule table for how to rewrite the gates to optimize for this. Then that rule needs to be applied. You have to be able to show me that this can all be done faster and cheaper than a x86 at 4Ghz just ramming it through. Maybe it can, but I am skeptical.
You are making the assumption that all of this is done on the fly. It's not. The compiler would, at compile time, locate candidates for hardware optimization, or the programmer would specify them explicitly. Also, it wouldn't use a "lookup table." It would basically be Verilog or VHDL, which would compile into netlists, which are placed and routed, all as part of the build process. So, the compiled program includes instructions to reconfigure the dynamic portion of the processor. Sure, each reconfiguration has some overhead attached to it, but remember that computers excel at repetitive tasks. You configure, for example, a Laplace transform circuit once, and use it multiple times throughout your program. Since the configurable portion has enough space to handle a number of special instructions, you put your heaviest, most-used instructions in hardware, and you are now doing complex transforms in a handful of cycles instead of hundreds (or more). Remember that executing an instruction in hardware is orders of magnitude faster than doing it in software. So, for sufficiently complex operations, you could realize huge, huge performance gains, even if you had to reconfigure the dynamic instruction every single time. I attended school at a place where some grad students were doing research into this very technology, and although I was a freshman at the time, I knew enough to understand how they could claim significant speed gains.

--

Today's Sesame Street was brought to you by the number e.

Finally by Anonymous Coward · 2004-04-26 07:14 · Score: 2, Funny

I can tell my computer to go fuck itself and it will.

Reduced Benefits for Virtual Machines? by SlipJig · 2004-04-26 07:15 · Score: 3, Insightful

IANAEE, but I was just wondering if this technology provides greater advantages to unique monolithic apps as opposed to apps targeted for virtual machines such as the JVM or CLR. Those VMs are general-purpose, and maybe apps that run on them would be "invisible" to the hardware reprogrammability... however I don't know how just-in-time native compilation might change that picture. Anyone with knowledge of this stuff care to enlighten?

--
Read my keyboard review.

Not really new technology by stephenry · 2004-04-26 07:15 · Score: 5, Informative

It's called DISC, Dynamically Reconfigurable-Set Computer. It's existed for a few years now. If I remember correctly, there is a group at Berkley working in the area and have released a few nice papers on it.

Re:Not really new technology by wed128 · 2004-04-26 07:27 · Score: 2, Insightful

yea, but a working implementation is a long way from a concept paper...

That reminds me of... by ajiva · 2004-04-26 07:16 · Score: 4, Interesting

I remember a project where hardware engineers setup a cpu to modify itself until it learned to do a task by itself. It got to the point where the hardware was doing the right thing, but not because the hardware was reconfigured properly, but because the software was using minute naunances in the electricity flowing through to get the job done. Even the hardware designers had no idea how it could possible be working

Re:That reminds me of... by itp · 2004-04-26 07:38 · Score: 4, Interesting

It was an FPGA, and it wasn't the CPU modifying itself, it was a genetic algorithm designing a circuit that would perform a specific task (differentiate between two different ranges of input signals, IIRC).

The interesting result was that the circuit designed by the GA didn't use conventional structures, but instead, according to traditional circuit design theory, should not have functioned at all -- dead loops, etc. The behavior and result was tied to the physical FPGA being used to test and give feedback to the GA -- the minute nuances, as you referred to them -- and was not portable to even another instance of the exact same FPGA.
Re:That reminds me of... by bigbigbison · 2004-04-26 07:54 · Score: 4, Interesting

I remember reading about this in either Popular Science of Discover magazine. I seem to remember that the head researcher took the chips to another building or room to show them off and they didn't work. Then took them back to the room they came from and they worked again. They finally determined that the rooms had slightly different temperature and the chips were so specific to that environment thta changing the temperature even a tiny bit stopped them from working.
Crazy stuff.

--
http://www.popularculturegaming.com -- my blog about the culture of videogame players
Re:That reminds me of... by jcorgan · 2004-04-26 07:59 · Score: 4, Informative

This was Adrian Thompson's doctoral thesis in 1996.
He used a Xilinx FPGA and a genetic algorithm (implemented separately) to evolve a circuit which could distinguish (IIRC) two different frequency tones on the input as a logic level output. The "program" was allowed to interconnect the FPGA configurable logic blocks in any old sort of way internally and between CLBs. This would include ways which would cause logic designers to shudder in horror :), and did not include a clock input to the circuit at all.
The result was a successful circuit that used a relatively small portion of the FPGA. But trying to work out how it was accomplished the tone discrimination was impossible. There were sub-circuits that were isolated from the rest of the circuit but when removed would cause the circuit to fail. Thompson hypothesized that the circuits were taking advantage of "out of band" communication via electromagnetic or thermal influences on adjacent CLBs.
Furthermore, the circuits turned out to be very specific to the ambient temperature during training and usage, as well as being specific to a particular FPGA used (a working circuit on one would fail on another.)
In any case it was a fascinating small-scale exploration of what reconfigurable hardware and genetic algorithms could accomplish, when not constrained by the "clock driven sequential logic" paradigm nearly all human engineered circuits use.

--
Babies are cute because they have to be.

damn!! by Mastadex · 2004-04-26 07:17 · Score: 2, Funny

I like to welcome our new reprogrammed overlords...

--
A morning without coffee is like something without something else.

errmm... by torpor · 2004-04-26 07:17 · Score: 2, Funny

... earth to slashdoid,

being code to being "on the chip" and that's sure to speed up the experienced speed.

first, where exactly is code run, if it isn't 'on a chip', and second, what? speed up the experienced speed?

you mean, as opposed to something like 'pretended speed', which is what i imagine you were using to measure your rapid desire to let your undoubtedly 'speedy' fingers get through your slashdot post without thinking ...

'experienced speed' indeed...

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

Re:errmm... by fitten · 2004-04-26 07:44 · Score: 2, Informative

When a function is defined in code, you have to use multiple processor cycles to complete the function. However, when the funciton is "on the chip", that entire function can be completed in just one assembly-level call to the processor.

But you cannot say that one "assembly level call" to the processor will take (even) fewer "processor cycles" to complete. Hint: very few instructions in even today's CPUs take a single clock cycle to execute, most take several, it's just with pipelining, many instructions have a retirement rate of one (or more) per-clock.

This isn't a silver bullet. In fact, the big deal about this thing is that it combines an FPGA and the processor onto a single chip. Before this, you'd write it all and implement it on a single FPGA, where it would be generally slow/simple for the general purpose part or you'd use an FPGA as a co-processor and feed it with a host CPU.

Re:Cue Skynet jokes by WwWonka · 2004-04-26 07:17 · Score: 5, Funny

Cue Skynet jokes...GO!

Sooooo this T800 model Terminator walks into a bar with a poodle under on arm and a basketball under the other...

Sounds good on paper, but... by Anonymous Coward · 2004-04-26 07:18 · Score: 5, Insightful

...I sense another Transmeta coming on...

Yes sure, rewirable chips would be cool for certain applications, but how does one go about making it deal with multiple applications with multiple needs? You'd over load the CPU with a truckload of specialized instructions - which would probably slow it down. Granted, I see uses in things like mobile phones, but for multitasking machines, a 'Jack of all trades' chip is the way to go.

Re:Sounds good on paper, but... by exp(pi*sqrt(163)) · 2004-04-26 10:18 · Score: 2, Informative

You have OS support. New instructions are a resource that the OS manages. Too many processes want to add their own instructions? Then when a context switch takes place the OS overwrites instructions for the outgoing context with instructions for the new one. Same as managing small amounts of RAM by swapping.

--
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.

not quite accurate summary by ebrandsberg · 2004-04-26 07:19 · Score: 3, Interesting

From what I gathered, this allows the compiler to create an instruction that can do a lot of work in one instruction, NOT for the processor to decide to create an instruction. Think of it this way, if you know you need to do something like an array multiply many times, the compiler could create an instruction for it, and then use it as needed. The key to this is that the instruction set can be optimized on a program basis, so you don't waste gates on SSE2 instructions if you don't use them, etc.

This would compare with FPGA's I believe in that most FPGA applications are fixed once loaded, although I know that there was talk about stargate systems on slashdot (http://slashdot.org/article.pl?sid=03/02/15/16292 37&mode=nested&tid=126)
using FPGA's for general processing before.

Insightful?! by CedgeS · 2004-04-26 07:21 · Score: 2, Funny

Wow! The virus could execute arbitrary code! Just like if it could choose which of the existing instructions were executed by another processor. The core part of your virus could run faster, maybe in just one clock cycle!

Re:Insightful?! by CedgeS · 2004-04-26 07:33 · Score: 4, Interesting

Easy - Say, the extra instructions are supposed perform a matrix convolution. Call extra instruction 1 with some random matrix. If it doesn't calculate the same thing as a slow version run in the regular RISC part you know extra instruction 1 has in some way failed and needs to be reprogrammed. Your virus software and OS etc should never have special instructions and are always run in the regular RISC part.
I highly doubt anyone is planning on making PCs with these. They are designed for being a processor in something like a data logging / control system, surveillance video compression, etc. Your system will probably have no need for virus detection any more specific than other more general regression and test suites it will need during operation.
Re:Insightful?! by Short+Circuit · 2004-04-26 08:00 · Score: 2, Interesting

Reprogrammable processors would be great for PCs as a sort of subprocessor. Games could offload calculations for their physics and AI models. Spreadsheets could offload all sorts of calculations. Mathematics-intensive applications could implement their own random-number generating algorithm.

In fact, there may be advantages to dumbing down the CPU somewhat. Remove some of the SIMD instructions in favor of applications and libraries implementing more specialized routines in the subprocessor.

--
tasks(723) drafts(105) languages(484) examples(29106)
Re:Insightful?! by Jennifer+E.+Elaan · 2004-04-26 10:29 · Score: 2, Insightful

Actually, it's almost certainly based on standard SRAM FPGA technology. It's quite cheap in terms of power, and not especially expensive in terms of time, to reprogram, and there is no degredation over time from doing it too often. The only real disadvantage is that it might be entirely possible to create on-die shorts with bad programming data, as it currently is in FPGA's.

--
Hardware, software, and blinking lights!

PLD's have been around for years. by dispater124 · 2004-04-26 07:22 · Score: 2, Informative

The concept of a programmable hardware device isn't all that new. And the encoding and encryption they talk about speeding up is a typical application of PLD's. High end routers use similar devices to optimize their tables etc. Kuro5shin has a nice article for beginners. http://www.kuro5hin.org/story/2004/2/27/213254/152

FPGA by tttonyyy · 2004-04-26 07:23 · Score: 2, Interesting

FPGAs have had processor IPs available for a while, which, in theory, can be reprogrammed on the fly. But AFAIK, no-one does this. I doubt this will be any different.

Hardware manufacturers that need special hardware operations (IE MPEG-2 decoding) use dedicated, custom hardware for large volume production. Dynamically configurable hardware is expensive for large scales production, and small scale production will likely use FPGA for similar effect. I may be sceptical, but I doubt it'll catch on.

--
biopowered.co.uk - catalytically cracking triglycerides for home automotive use since 2008. Just say no to big oil!

Not too different from what's already available... by stienman · 2004-04-26 07:24 · Score: 5, Informative

This is evolutionary, not revolutionary. Many chipmakers have offered microcontrollers and microprocessors with FPGA on chip. Often it is an extension of the I/O built into the processor, so it's not much different than an external FPGA on the processor bus. Please note that this is NOT like processors that run on the FPGA itself - these are seperate from the FPGA portion of the chip.

Stretch is different in a few ways:
It pulls the FPGA closer to the core, so that it can be utilized almost as part of the pipeline. I say almost because of the following statement in the article:
Inside the chip, the ISEF is coupled to the rest of the circuit by 128-bit buses and has 32 128-bit registers. It runs in parallel with other areas of the processor, effectively becoming a fully reconfigurable co-processor, and can be reprogrammed for new instructions at any time during operation.

So it's still fairly seperate from the processor core.

But the core itself is high performance (fast clock, a little faster than the average FPGA) and it has a very fast memory bus (again faster than the average FPGA)

The downsides are likely to be:
1) Power cost and dissipation. Since it's a slow clock, the dissipation probably won't be bad, but it's not going into a small portable machine.
2) Time to reconfigure. This isn't meant to be a general processor with task switching. Context and task switching is going to be expensive and if you plan on running two concurrent tasks which both require special instructions the entire processor will likely perform, on average, much worse than it would without the reconfigurable portion. Unless, of course, the processes were created to use the same set of special instructions so the context switch isn't more expesnsive than it is for today's processors.

So they are targetting it correctly, it seems. Specialized areas with, in general, only one task/program running at a time. Multimedia players, for example, would be great here. A digital recorder/player would work well if both the encoding and decoding portions of the code were compiled so the special instructions created wouldn't have to be changed for either application to allow playback while recording.

-Adam

How will this affect cross-platform development? by ezraekman · 2004-04-26 07:25 · Score: 3, Interesting

This sounds vaguely like the dream solution for developers. The article says:

"It runs in parallel with other areas of the processor, effectively becoming a fully reconfigurable co-processor, and can be reprogrammed for new instructions at any time during operation."

Does that mean it can handly booting multiple OSes simutaniously? If so, how long before someone writes an app that bridges multiple OSes, allowing the equivalent of emulation, without the emulation? I don't know about the rest of you, but the potential of this chip sounds like a dream come true. And at $35-$100 per chip... it's cheaper than the processor for most systems anyway.

The first processor that can? by mrplado · 2004-04-26 07:27 · Score: 5, Informative

The first processor that can add to its instruction set while operating? I think there were a few microprogrammed processors in the 70s/80s with writable control store that could do exactly that. Anybody remember PERQ workstations? Now this new gadget appears to be able to extend itself by means of an embedded FPGA, instead of plain old microcode, so it's a bit like the Xilinx Virtex II PRO series (PowerPC core with big FPGA on one chip). The really innovative thing is that you don't have to program the FPGA in VHDL or Verilog, but the C++ compiler takes care of that.

Well... by Ayanami+Rei · 2004-04-26 07:27 · Score: 3, Informative

This is basically an FPGA married to a RISC processor. So if you have a bit of RISC code that can be simulated using the FPGA portion, and you have enough spare cells to add it, and it takes 10 clock cycles for the FPGA "user instruction" to dispatch, but it takes 200 to do it outright in the original RISC instructions, then you're experiencing a 20 to 1 speed increase for that bit. You speed up the function without overclocking. Actually what you've done is "trade off".

He could have posted clearer, if he wasn't trying for first post.

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON

Re:Well... by torpor · 2004-04-26 07:48 · Score: 2, Interesting

i could imagine it no so much as an 'optimization' device, but as a complete 'system-description' protocol machine.

in other words, i can not only embed codec details in my datastream to you, but at the beginning of it all, i can give you a 'cpu package' that you can use to run my custom codec, perhaps just once...

what interests me about the S5000 is, what of the S5500, &etc? do they have plans to segragate cores from each other in other ways - say by way of a 'certficate broker' chip, also on-board?

because if so, this could be a real boon for future media control, as long as the other reasons for this chips success actually are also fruitious, and results in a real market deployment.

being able to change not just instructions, but what those instructions mean, dynamically over a protected core, would give software a new protection mechanism, is what i'm trying to get at ...

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

Gaming? by shirai · 2004-04-26 07:28 · Score: 3, Interesting

One of the best applications for this chip is a programmable Graphics card.

Imagine the optimizations that you could do for the next release of the Doom engine. They could own the market for GPUs that optimizes itself for specific games. Could be amazing.

--
Sunny

Be my Friend

Better article on EETimes by apirkle · 2004-04-26 07:28 · Score: 4, Informative

There is a much, much better article with lots more detail on EETimes.com.

Woooo by Cr3d3nd0 · 2004-04-26 07:28 · Score: 3, Interesting

I can just see this processor, mixed with a bit of Mark Tildens analog AI research to really advance Artificial Intelligence. For the uninitiated Mark Tilden discovered that by tying a group of only four or so transistors and sending a regular analog signal through it he could get small robots to walk, and indeed do an amazing number of things, including optimize it's path and even remember it's solution for a small amount of time(about 3 or 4 seconds). Not only that but when given a certain stimulus need (example make them solar powered and have only one are of light they would compete with other bots to gain access to better light. Indeed a lot of the behavior that these little bots produce is so complex and life like that he has spent a long time just documenting behaviour. Now give a set of these bot's circuits the ability to "optimize" the speed of the signal, and a few stimuly and let it play. If the stimulous was for "human approval" some input from a human indicating good or bad.... Heck what do I know, I'm non AI researcher but it always sounded cool to me :-) For more information on Mark Tilden go to BEAM Online

--
This is not a sig

Re:Woooo by narcc · 2004-04-26 07:54 · Score: 3, Informative

Some more information

--
Required reading for internet skeptics

Re:Hmmmm... by schon · 2004-04-26 07:29 · Score: 5, Funny

I tried to do something like this once, but I kept running into the problem of differential voltages in the pulse-modulated ion core.

Ahh - that's easy. You should have routed the ion core voltages through a phase discriminator; would have cleared that right up.

I think they must have shunted the positrons through the floating point pathways

No, that would have caused a cascade failure in the deflector array.

Re:Ummm... by narcc · 2004-04-26 07:31 · Score: 2, Insightful

Nope
See the script

--
Required reading for internet skeptics

even more info by Anonymous Coward · 2004-04-26 07:32 · Score: 2, Informative

EE Times has an article here. Apparently this chip has a competitor. There's also more details about the chip itself.

(Anonymous because logging in at work)

One piece missing for genetic processing... by 192939495969798999 · 2004-04-26 07:35 · Score: 2, Interesting

That insanely complicated piece of software that can automatically figure out what it needs the chip to do at any given time for its own survival --
oh yeah, we have those... PEOPLE! Now, can I get those neural processor connects and graft this thing to my head already?

--
stuff |

Re:hahahahaha ... Worst Math Ever by claar · 2004-04-26 07:39 · Score: 2, Informative

Well, even if his math was wrong, his point is still valid.. going from 5 trillion years to 5 billion years isn't much different (of course, even 128 bit encryption is currently thought to take much longer than a measly 5 trillion years to brute force).

Most cryptology systems are purposefully designed to take an absolutely absurd amount of time to crack -- exactly to account for many of these instant 1000 fold improvements.

--
I'd give my right arm to be ambidextrous...

new concept, but not new hardware by Gyorg_Lavode · 2004-04-26 07:40 · Score: 3, Insightful

The idea of programmable chips is nothing new. Xlinx etc have been doing it for ever. The idea of putting both a standard core w/ a generic instruction set AND a programmable core ont he same chip is very interesting. It will, however, be a niche product. You aren't going to use it in your home computer because your home computer does a broad range of things.

This will be useful in places that they mentioned. Places where you do a lot of processing that takes many generic instructions but can be translated into a single string of descrete instuctions.

The more I think about it, this is the direction processors are going. We keep moving processors towards RISC based cores. We keep adding specialized paths for things such as multimedia. Eventually we WILL have half the processor being a purely RISC core and half being programmable hardware for specialized computational intensive instructions. I retract my initial view.

I do wonder though, what the life is on the hardware side. How many times can you reprogram the hardware before it starts to die. What is the error rate in reprogramming it? What happens when a few programmable transistors die?

--
I do security

This != New by sam_van · 2004-04-26 07:40 · Score: 2, Informative

I've noticed some folks comparing this to Transmeta. While similar, there are a few more comparable architectures out there.

Perhaps the most notable (in its conception, at least) was Seymour Cray's attempt at a Pentium Pro core + reprogrammable extensions (via FPGA or the like) at his post-Cray Research company. More recently, IBM licensed PowerPC cores for use by Xilinx. Up to four of those cores get thrown on the die with a Virtex-II FPGA (?); each of the cores has the ability to add opcodes in FPGA land.

Even more recently was my last company's valiant effort at something similar (and even cooler). RIP, SiliconMobius.

--
Thinking of starting a business in Minnesota? Me too! mnsmall.biz

FPGAs and the rest of the acronym zoo. by Christopher+Thomas · 2004-04-26 07:43 · Score: 5, Informative

How is this different from FPGA's?

Short answer: FPGAs let you build using basic gates and (very small) lookup tables. This lets you build anything you please, and fully optimize the number of functional units of each type that you have, but has a speed and size penalty.

This chip is basically a RISC processor with an FPGA-type fabric bolted on as a co-processor, as far as I can tell from the detail-poor press release. By implementing most of the instruction pipeline as fixed, optimized hardware, it runs without any of the penalties of a purely FPGA-based implementation. When you have a number-crunching task that would benefit from a custom logic implementation enough to offset the performance penalty of implementing it in programmable logic blocks, the compiler configures the programmable logic into a suitable coprocessor which is stuck in as an extra branch of the instruction pipeline.

How much benefit you get from this depends on what you're doing. Modern general-purpose microprocessors have enough vector instructions to handle most DSP-ish tasks without an abysmal speed penalty (just a large size and power penalty over a purely DSP-based implementation). Most computing tasks aren't limited by processing horsepower at all - they're either waiting for memory accesses to complete (even cache accesses are very slow compared to register accesses), or they're waiting for the target address of a branch to be decided (speculation and BTBs don't address this perfectly by a long shot). A reconfigurable processor would suffer from much the same type of problem. While using the programmable logic path for slice processing could remove some of the branching penalties (by following all paths and selecting the desired result), this would be at an even greater area and power cost.

For specialized applications, it would be quite useful, of course.

A quick glossary of terms being thrown around, for anyone confused:

FPGA - Field Programmable Gate Array.
This is a combination of lookup tables, sum-of-products combinational logic blocks, and scratch-pad SRAM that you can hook up in nearly arbitrary ways to produce custom circuits at a gate level. Bulky and slow, but good at implementing algorithms efficiently. Configuration information is loaded from a serial PROM chip at startup, letting you change it relatively easily.
CPLD - Complex Programmable Logic Device.
Like an FPGA, but stores configuration information internally, so you need to take out the CPLD and burn it to change configuration instead of re-burning the configuration PROM.
PLA/PLD - Programmable Logic Array/Device.
Little cousin to CPLD. This is what you played with in second or third year. Typically these are just a sum-of-products combinational logic block with a register stuck on the end to latch the output. Useful as glue logic.
ASIC - Application-Specific Integrated Circuit.
This is an integrated circuit that's half-made. A number of gates and registers and so forth have been fabricated on the chip, and the lowest few metal layers have been used for internal routing for these, but you get to define the upper metal layers to form arbitrary connections among these (either as the last fabrication step, or by laser-cutting a pre-fabricated wiring mesh to leave the geometry you want). Works much like a CPLD, but the design is decided at fabrication time and cannot be changed. Faster and less bulky than a CPLD implementation.
Standard cell design.
This is a custom-fabricated integrated circuit that uses cells from a standard library of components, usually automatically placed and routed from a VHDL or Verilog description of what you want the chip to do. Faster than an ASIC if you have good place and route software, but more expensive in small quantities because you're making what amounts to a full custom chip. Design time is much less than a fully custom design would be, though (but verifying that the design description is correct is a royal pain).

I hope this clears things up for anyone who was confused.

Real World Performance by AhBeeDoi · 2004-04-26 07:44 · Score: 2, Insightful

Stretch claims that their CPU running at 300MHz has shown superior performance to a 2GHz box. We have no details of their testing and I wonder about the real world performance.

Natural questions come to mind like how quickly does the chip configure itself to optimize for the application, does the configuration only occur at start of the application, how many chip-configuring applications can it run concurrently, will it optimize for interpreted languages, can some configurations be made "permanent" to accommodate the OS used. I can see how this chip would optimize some specialized tasks, but I don't know if it will run well in an evironment where many different types of tasks are expected to run at the same time.

Another issue relating to the gaining acceptance is whether Stretch releases specs so that others can write their own compilers. Is Stretch pursuing a pure hardware strategy (not trying to sell compilers, create their own OS, etc)?

Re:Ok froggy. by Neil+Blender · 2004-04-26 07:45 · Score: 2, Funny

Je ne se quoi?

It means, well, it means... Uh, actually, I don't know quite how to describe it.

Compelling market proposition? by Ars-Fartsica · 2004-04-26 07:46 · Score: 2, Insightful

General purpose CPUs are fast, ubiqutous, and cheap. While compelling, this new approach is in no sense a slam-dunk in the market. Stretch will have to show a compelling case why this is a faster and cheaper alternative to the x86 (compatible) hegemony.

Perfect for emulation by arock99 · 2004-04-26 07:46 · Score: 3, Interesting

Sounds like this would be a perfect processor for emulating consoles such as the SNES, XBOX, GameCube, PS2, etc etc or pretty much any other processor.

Field Programmable Gate Arrays (FPGAs) by Lust · 2004-04-26 07:53 · Score: 2, Interesting

This reminds me of Field Programmable Gate Arrays. Can someone explain the difference?

stop the madness. by twitter · 2004-04-26 07:58 · Score: 2, Insightful

How do you detect a virus that has control of the underlying hardware though...

The same way you detect a virus on any machine that has been compromised, with another machine and or a thorough understanding of normal operation and running processes. Nothing new here. Evaluate the harm done by a potential compromise and take steps accordingly.

There is no practical difference between a hardware and a software compromise and the remedy is the same. Indeed, for critical purposes, there's little difference between a hardware compromise and a simple failure. You should anticipate it and not get burnt. The bottom line is know your shit and be in control when strange things happen.

Security is a process and must be applied system wide. If you don't have reasonable configuration control, you are already lost. If you run junky closed software that's full of bugs and does not keep track of uid, pid or processes themselves you are always in for a rough ride. The trouble given you there will distract your operators, like it did for the last big blackout. Every piece has to be taken considered in context. It's not hard, it just takes time, organization and judgment.

I hate how Ludites always look at any new tool and cry out, "look how awful [insert wonderful new power] is!"

--

Friends don't help friends install M$ junk.

Based on Tensilica Xtensa technology by Anonymous Coward · 2004-04-26 07:59 · Score: 2, Insightful

Looking at their brochure, it is based on Tensilica Xtensa technology (www.tensilica.com) which I know has been around for atleast 3 years. Nothing remarkable. Many companies have developed similar products.

Help for a n00b. by Paulrothrock · 2004-04-26 08:24 · Score: 2

What's sky net? Terminator reference? Huh?

--
I'm in the hole of the broadband donut.

Been there, done that by TheAncientHacker · 2004-04-26 08:39 · Score: 2, Interesting

The original design for the Zilog Z-80000 (Not to be confused with the Z80000 that actually shipped and was an enhanced Z8001) was also dynamically self configuring and optimized its execution based on the frequency of use of instructions.

Of course, that was only a little over 20 years ago.

FYI: Since somebody is going to ask... The original Z80000 design was killed when Zilog stalled out as a general purpose processor maker and moved into embedded processors after the bugs in the initial run of Z8001 chips and IBM's selection of the Intel 8088.

Two companies announced similar products today by gupg · 2004-04-26 09:28 · Score: 2, Informative

It seems Stretch is not the only company that announced such a product today: EE Times article.
Also, keep in mind, customizable ISAs have been around for a while -- in Tensilica and ARC processors. These guys do it dynamically.

Altera's Nios Processor by cybergibbons · 2004-04-26 10:33 · Score: 2, Interesting

I'm currently working on modular multiprocessor systems implemented on FPGAs, so this field is something I know something about.

Altera produce an FPGA with one or more built in ARM processors. This sounds very similar to the Scratch system, but the ARM processors are limited in connection into the fabric of the FPGA by the not particularly fast bus used with the processor. Scratch appear to have made the data transfer rate between the two parts of utmost importance, which is essential in high throughput applications like this.

Altera have also developed a softcore processor, that is one implemented entirely on an FPGA. It is highly configurable - instructions can be added, cache and memory behavior altered, buses adapted, etc. Coupled with things such as the DSP blocks (trees of multiply accumulates), a 50Mhz processor can process data in a specific task at the same rate as a general purpose processor running at 10 times the speed.

The work I'm doing is investigating the use of many of these processors on one fpga. Levels of optimisation that cannot be done with conventional multiprocessor systems will be possible. Changing the memory system to deal with specific algoriths, or bus widths between certain processors will allow much better performance.

Scratch also seems to be making a difference by claiming to have easy to use and working development tools, which is one thing that Altera cannot really claim to have done.

processor + logic by period3 · 2004-04-26 11:55 · Score: 2, Informative

Though not the same as this, the Xilinx Vertex II Pro combines an FPGA and PowerPC risc core on the same chip.

The Altera Excalibur does something similar with an ARM processor core and programmable logic.

Both of these have been around for a while...

Slashdot Mirror

Stretch Announces Chip That Rewires Itself On The Fly

83 of 311 comments (clear)