Start-up Could Kick Opteron into Overdrive

Berkeley by 2.7182 · 2006-04-23 23:38 · Score: 2, Interesting

I thougt they had done this out at Berkeley a while back. Is it really a new thing ?

Re:Berkeley by Feminist-Mom · 2006-04-23 23:40 · Score: 1

Well, I think Bajcsy and Sastry at EECS Berkeley had done something like this, but I don't know if that is the same gadget as this to tell you the truth.
Re:Berkeley by dingDaShan · 2006-04-23 23:45 · Score: 2, Insightful

I'm sorry but 5k for a little chip that makes my opteron a little faster? I could just buy another opteron for that price: http://www.pricewatch.com/cpu/419325-1.htm> The price is supposed to drop to 3k next year. How does this affect cooling?
Re:Berkeley by Whiney+Mac+Fanboy · 2006-04-24 00:00 · Score: 4, Interesting

I'm sorry but 5k for a little chip that makes my opteron a little faster? I could just buy another opteron for that price: http://www.pricewatch.com/cpu/419325-1.htm> The price is supposed to drop to 3k next year.

You're quite right that these are not for you - their to run highly specialised calculations (the oil & gas industries are mentioned in TFA).

They make some operations much faster (think of a hardware mpeg decoder, useless for most things, but much more efficient for the single thing it can do then a general purpose CPU)

How does this affect cooling?

These things consume 10-20 watts compared to an Opeteron's 80, so it's affect on cooling is minimal (far less then adding the second opteron that you propose)

--
There are shills on slashdot. Apparently, I'm one of them.
Re:Berkeley by Anonymous Coward · 2006-04-24 00:14 · Score: 0

Well most of the CS and Math department doesn't really go outside to participate, so what everyone sees is the people who chose a major that lets you have free time to do demonstrations and what not.

We don't like outside.
Re:Berkeley by mwvdlee · 2006-04-24 00:18 · Score: 1

RTFA

They claim 10-20x the performance of an Opteron for specific tasks. They also claim 3x the price/performance of an Opteron.
Since it costs about 3x the price of an Opteron, and performs atleast 10x better, their 3x price/performance claim seems pretty valid.
Ofcourse, it needs to be programmed for highly specific tasks. But chances are that, if you're in the Opteron-buying market, you need it for highly specific tasks.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Berkeley by moro_666 · 2006-04-24 01:48 · Score: 1

they should optimize this for gcc & C code compilation.

if that would be the case, perhaps my gentoo machine
would be complete before christmas:)

--

I'd tell you the chances of this story being a dupe, but you wouldn't like it.
Re:Berkeley by drgonzo59 · 2006-04-24 02:17 · Score: 2, Interesting

A fast FFT processor, for example, would make the life easier for a lot of Photoshop filters users (with the help of special drivers and plugins), it would also help the GNU Radio quite a bit, as well as other multimedia/signal/data processing applications.
Re:Berkeley by networkBoy · 2006-04-24 02:36 · Score: 1

But according to another /. article http://politics.slashdot.org/article.pl?sid=06/04/ 24/0358210
This will fund terrorisim by allowing us to transcode media files at an absolutely astounding rate*.
-nB

* Actually this looks great for the likes of LAL, Pixar, and other video shops. I'm a die hard Intel fanboi (last used AMD on my 386sx33) and this has me looking to buy a platform....
Didn't someone try this on the memory bus once? Someone by the name of neuron? Whatever happened with that?
-nB

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:Berkeley by oringo · 2006-04-24 02:55 · Score: 1

I've seen research done on speeding up XML queries with Xilinx FPGA. An opteron 8xx costs about $2000 a piece, so if one of those little sucker can give you at least 3x the performance of an opteron doing SQL query, I say we have a good contender in database applications!
Re:Berkeley by drgonzo59 · 2006-04-24 03:27 · Score: 1

Even though you mentioned terrorism as (half?)-joke, a fast FFT processor would probably be regualated by the govt., what that would mean is that people could just program in software a soffisticated and fast signal decoder that would normally cost tens or hundreds of thousands of dollars to buy as hardware. In a second it could all be reprogrammed into something else. So imagine having a police scanner, an HDTV, FM radio, etc etc all in just a laptop with some kind of a simple RF antenna input and amplifier. That would be quite insane -- decoding algorithms could just be downloaded from the Internet. Check out the GNU Radio project here here
Re:Berkeley by hackstraw · 2006-04-24 06:33 · Score: 2, Informative

A fast FFT processor, for example, would make the life easier for a lot of Photoshop filters users (with the help of special drivers and plugins), it would also help the GNU Radio quite a bit, as well as other multimedia/signal/data processing applications.

There have been tons of addon cards that do FFTs, TCP offloading NICs, physics engines, or whatever you want. The problem is twofold. 1) These cards are expensive, or at the least nonfree and nonstandard as the rest of the computer and need software support to drive them 2) They often do not give the performance as advertised.

Take for example an FFT card for Photoshop filters. The image is in GPU memory and in system RAM. The image must be sent from system RAM over to the PCI card. Even if the card was sitting off of a HyperTransport, which is about the fastest external bus available on a PC today is only 3.2 GB/s. PCI is between 133 MB/s to 2133 MB/s for the new PCI-X second generation. Its common for memory busses to be in excess of 3.2 GB/s and some of the new Itaniums have something like 10.5 GB/s memory busses now.

Back to price. These cards are a niche product, so the price has to be high because the demand is low. The price of these cards can skyrocket very quickly because its common for these things to have RAM on them for cache and buffering, and this cache is often needed in the 1-4+ gigabyte range, which is not cheap in itself.

I'm not saying that I would not welcome something like an effective FFT offloading engine, but there is so much pre/post processing on the data that needs to come through the main system memory through the CPU, that the offloaders don't give you much.

For high performance computing, memory bandwidth is frequently the bottleneck, and has been for years. High end GPUs are a little different because they have had specialized busses for years (AGP and the like), and they also have the advantage of being told what to do by the CPU, given some data, and then internally processing it, and dumping the data straight to the monitor. The CPU does not need that data back. Its more or less a one way operation, the other offloader cards are usually a 2 way operation. Even in the case of TCP offloader cards, performance often does not keep up with software and general CPU improvements. Also, TCP offloaders don't work very well with things like software firewalls that want their hands in monkeying with the TCP data as well.

So, I believe at this point in time, offloader cards are not too valuable. Maybe for a specific problem or set of problems, but I haven't found one that could significantly improve performance yet.

So... by Morosoph · 2006-04-23 23:41 · Score: 1, Insightful

What do folks here really want to optimise?

Rendering comes to mind, but I'm biased. But I'm sure that a glorified graphics card isn't the most interesting use...

If these become popular enough, will we be seeing a back-end to GCC for this FPGA?

--
Wikileaks, no DNS

Re:So... by BenjyD · 2006-04-23 23:53 · Score: 4, Interesting

The article mentions applications in gas and oil companies. I would guess that means things like:

- MINLP/MILP (Wikipedia article is a bit weak) and Branch and Bound optimisation for things like pipeline routing, well selection etc.
- fluid mechanics for pipeline design
- geological data-mining for finding reservoirs
Those kind of jobs can have runtimes measured in days and weeks, so an accelerator could make a real difference.
Re:So... by bhima · 2006-04-23 23:55 · Score: 2, Interesting

I would dearly love a cryptoprocessor and looking at the specs it doesn't look at that far away.

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Re:So... by SlashNut · 2006-04-24 03:35 · Score: 0

Speeding up Blast (Smith/Waterman) genetic database searches. There is a commercial product (FPGA + Software) that sells for $30k that does this. You get a 50x-100x speedup (NOT PERCENT!).
Re:So... by Acromion · 2006-04-24 05:32 · Score: 0

Seismic data processing is one of the best uses for this technology. If we could simply have a chip which would do FFTs or Matrix inversion very quickly we could reduce run times greatly. 3D seismic processing is incredibly CPU hungry due to large datasets, high sample rates and the need to really massage the data before its useful.

--
Open source is like a British car. Not only can I get under the hood, I seem to spend a lot of time here.
Re:So... by try_anything · 2006-04-24 19:40 · Score: 1

will we be seeing a back-end to GCC for this FPGA?
Hardly. The languages for which gcc has front-ends (C, Fortran, C++, Ada, etc.) are heavily biased toward CPUs that process a stream of instructions: load, store, add, and, branch, compare, etc. The highly parallelized and pipelined designs that make FPGAs so much faster than microprocessors can't be expressed in software languages, and producing a good hardware design that is equivalent to a given C program is, well, a much harder problem than creating hardware designers who can design in VHDL and Verilog.
Put another way, getting a good hardware design from a good software developer is as likely as getting a good software design from a bad software developer, because it's just as hard. (The first company that cracks either of these nuts will instantly become a household name, so don't worry about missing it.) Hopefully, as FPGAs reduce the barrier to entry in hardware design, the number of VHDL/Verilog designers will rise to meet demand.
Re:So... by Anonymous Coward · 2006-04-25 05:34 · Score: 0

I haven't met a large ecommerce site admin that is happy with their SSL performance... something like this could really increase the performance of your shopping cart and web site in general.

Microsoft eyeing up these by LiquidCoooled · 2006-04-23 23:42 · Score: 0, Redundant

A specialist co processor for processing your spyware and spam would be double plus good.

I actually like the idea of a co processor sitting there, but I wonder why you wouldn't just stick another opteron in and write custom code?

--
liqbase :: faster than paper

Kick ass synth? by Max+Romantschuk · 2006-04-23 23:43 · Score: 3, Interesting

This could really be an interesting way to boost real time soft synths... Even with top of the line processors the more complex ones will bring a CPU to it's knees. Seems like a more sensible option compared to a DSP-filled expansion card. Too bad this thing is still a little on the expensive side for a viable market on the music software side.

--
.: Max Romantschuk :: http://max.romantschuk.fi/

Re:Kick ass synth? by alienw · 2006-04-24 00:15 · Score: 1, Informative

An FPGA does not make a very good DSP for the price. I suppose if it's one of the nicer ones from the Virtex series, you can get it to do DSP, but it won't be as good as the processor already in the PC. I'd say your best bet would be hacking a videocard to do the synth stuff -- it's optimized for the kind of parallel computation that DSP requires.
Re:Kick ass synth? by ettlz · 2006-04-24 00:24 · Score: 1

Too bad this thing is still a little on the expensive side for a viable market on the music software side.
...as compared to a ProTools DSP card?
Re:Kick ass synth? by Pulzar · 2006-04-24 02:06 · Score: 3, Interesting

An FPGA does not make a very good DSP for the price. I suppose if it's one of the nicer ones from the Virtex series, you can get it to do DSP, but it won't be as good as the processor already in the PC.

That's not true, at all. An FPGA will not be as good of a general-prupose DSP as a custom-made DSP, but it will still be better than a CPU -- even the low-cost Cyclone II comes with 150 dedicated multiplers coupled with embedded memory, so they can do parallel multiply/accumulate at 700+ MHz. And these are the low-end FPGAs...

Now, if you're actually programming the FPGA using custom-designed circuitry optimized for the task you're workin on, the FPGA will work a lot better than a general-purpose DSP, and be way ahead of an even more general purpose CPU. That's why you don't see generic DSPs being used in heavy DSP work (say, in telcos), but custom and semi-custom ASICs, and FPGAs in smaller environments.

--
Never underestimate the bandwidth of a 747 filled with CD-ROMs.
Re:Kick ass synth? by alienw · 2006-04-25 13:51 · Score: 1

Have you ever used one? It's tough to get the timing much above 100MHz. Even a 66 MHz PCI interface is extremely difficult to implement. No way those multipliers in a cyclone will work at 700MHz. Maybe in a $150 virtex 4 chip, but still unlikely. Besides, how the hell are you going to get that much data into the chip? There's a reason they don't put 150 parallel multipliers in a CPU, and it's not because they can't.

That's why you don't see generic DSPs being used in heavy DSP work (say, in telcos), but custom and semi-custom ASICs, and FPGAs in smaller environments.

Telcos need simple DSP, often on a large number of channels, and it needs to be done with precise timing. That's why FPGAs and ASICs are used. A dedicated DSP chip would not integrate well into the application, but it would certainly have enough power.

Why read a re-written press release by Threni · 2006-04-23 23:46 · Score: 4, Informative

when you can just read about it on the company's website?

http://www.drccomputer.com/pages/products.html

Hoo boy by Anonymous Coward · 2006-04-23 23:46 · Score: 0, Troll

The Register in "press release run as News Item" shock.

Quality? by wetfeetl33t · 2006-04-23 23:49 · Score: 1, Offtopic

Yup, seems like a pretty neat piece of hardware. The only thing I'd be worried about is quality. All of these alternative processors usually seem to good to be true, until you use them. At work, we ended up buying 15 computers with a similiar item, and they have been nothing but trouble. They underperform, they break, etc. Granted, this may be a high quality product, but I sure won't buy one right away.

--
Register the editry.

Re:Quality? by Anonymous Coward · 2006-04-24 09:58 · Score: 0

Because they seem to overheat, underpeform, overcharge, and used to require things like rambus.

Speed by Morosoph · 2006-04-23 23:51 · Score: 1

why you wouldn't just stick another opteron in and write custom code?

Well, here are two reasons that come to mind:

Dedicated Hardware goes one hell of a lot faster.
"The first set of DRC modules will consume about 10 - 20 watts versus close to 80 watts for an Opteron chip."

--
Wikileaks, no DNS

Re:Speed by pla · 2006-04-24 00:09 · Score: 2, Insightful

Dedicated Hardware goes one hell of a lot faster.

An FPGA doesn't equal dedicated hardware. It takes a performance hit (in some domains, a huge hit) in exchange for flexibility. It also requires code that supports it.

The first set of DRC modules will consume about 10 - 20 watts versus close to 80 watts for an Opteron chip.

People buying USD$5000 coprocessors, plus the cost of developing specialized code to use them, don't cut corners on the basis of their electric bill.
Re:Speed by 10Ghz · 2006-04-24 02:20 · Score: 1

"An FPGA doesn't equal dedicated hardware. It takes a performance hit (in some domains, a huge hit) in exchange for flexibility."

Well, Cray is using FPGA's as dedicated co-processors in some of their supercomputers. So they can be quite fast indeed.

--
Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.
Re:Speed by Anonymous Coward · 2006-04-24 03:33 · Score: 0

Power consumed == heat generated. Less power consumption, higher rack densities. Good for facilities with limited data centers and larger operations and higher density clusters. Even with some of the more 'out there' cooling solutions (Liebert's XD system, etc.) blade configurations are often heat load limited. The cost of power in data centers is more expensive than just the cost of the power consumed by the CPU.
Re:Speed by dubl-u · 2006-04-24 03:55 · Score: 3, Interesting

The first set of DRC modules will consume about 10 - 20 watts versus close to 80 watts for an Opteron chip.
People buying USD$5000 coprocessors, plus the cost of developing specialized code to use them, don't cut corners on the basis of their electric bill.

You're doing the math wrong. For decent colo space, I pay somewhere around $150 per rack-unit year and $120 per amp-year. If the coprocessor is really 10-20x faster for my workload, I don't just save the half-amp on one coprocessor; I get the savings on servers I don't need. Just in rack and power costs, one coprocessor would save at least $4k per year.

In other words, for its target audience, the $5k coprocessor would be more than paid for by the infrastructure savings alone. If you're the kind of company that is buying a few $5k coprocessors to replace $100k in servers, I hope you're thinking about your electricity bill, as it will be more than $25k over the lifespan of those machines.

A bit more accurate summary by subreality · 2006-04-23 23:54 · Score: 5, Informative

They basically made a FPGA (field programmable gate array) that can plug directly into HyperTransport (the Opteron CPU bus). FPGAs let you efficiently solve many problems that a general purpose processor can't. This has been done with PCI cards before, but the PCI is too slow for many uses. Giving it direct access to HT solves that problem.

That's a pretty cool niche.

Re:A bit more accurate summary by Metabolife · 2006-04-24 00:13 · Score: 1

The bigger question.. what to do with all those nand gates?
Re:A bit more accurate summary by Noehre · 2006-04-24 00:35 · Score: 1

There are Hypertransport add-in connectors (HTX connectors) on some server motherboards that would be much better suited for this sort of application.
Re:A bit more accurate summary by o'reor · 2006-04-24 03:14 · Score: 1

That's an easy one : write a VHDL app, generate the corresponding firmware, build a Linux driver for it, there you go.
Additional question : are there any generic driver templates for Hypertransport-based devices ?

--
In Soviet Russia, our new overlords are belong to all your base.
Re:A bit more accurate summary by Wesley+Felter · 2006-04-24 03:23 · Score: 1

I don't know about that; very few motherboards have HTX slots, but lots of motherboards have multiple processor sockets.
Re:A bit more accurate summary by rsun · 2006-04-24 06:30 · Score: 1

Ah, but unless you've rewritten the BIOS to expect a (presumably non-coherent) HT device in that socket, your SOL. Any motherboard with an HTX slot should have a BIOS designed to expect a random non-coherent device in that slot. Most Opteron motherboard bios's that I've had experience with have a fixed topology that basically says that the HT link between cpus is either not connected (no cpu present) or fully coherent (cpu present). In fact, most bios's don't even allow different speed and rev levels of opterons in the sockets. It's possible that these guys have licensed coherent HT from AMD (which, last I checked, was not openly available), but developing a coherent HT device is a lot harder than a non-coherent one (just ask Newisys).
Re:A bit more accurate summary by ThatFunkyMunki · 2006-04-24 06:33 · Score: 1

Explain. HT is quite possibly the greatest advance in processor technology since on-chip cache.

--
If patriotism is racist, is racism patriotic?
Re:A bit more accurate summary by Khyber · 2006-04-24 10:01 · Score: 1

So, for all of what you've said, this is basically a piece of hardware acceleration for a CPU? First we had sound acceleration, then 3d video acceleration, now we have CPU acceleration? If this is basically what it boils down to - why not just build the damned thing onto the die, like we did with the math co-processors?

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Re:A bit more accurate summary by Anonymous Coward · 2006-04-24 12:14 · Score: 0

I've done exactly this, with a board called the Pamette. Non-trival doesn't even begin to describe it.
Re:A bit more accurate summary by wed128 · 2006-04-24 12:51 · Score: 1

Because it is an expensive piece of very niche hardware. The average user doesn't need extremely fast FFT, DSP, or whatever operations on the cpu.

The reason Math Coprocessors (like the 487) got built into the die was because as more and more people used things like photoshop, the floating point performance came out of the realm of 'niche' and into the mainstream. All in all, most people would be better off with a second opteron in the server to hand out web pages and e-mail, rather than this which is more suited to speacialty applications like scientific computing or signal processing.
Re:A bit more accurate summary by gfody · 2006-04-24 17:46 · Score: 1

could open graphics run on it?

--

bite my glorious golden ass.

price performance by sfraggle · 2006-04-23 23:56 · Score: 4, Funny

From the article:

"We have taken the approach that we must deliver three times the price-performance of a standard blade."

Isn't this BAD? Three times the Price/performance ratio would imply a higher price, or worse performance.

--
were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!

Re:price performance by maxwell+demon · 2006-04-24 00:24 · Score: 1

I guess you'd also take the job with the lower monthly wage, because a smaller time/money ratio means more money in the same time, right?

--
The Tao of math: The numbers you can count are not the real numbers.
Re:price performance by knightmad · 2006-04-24 00:46 · Score: 1

Is that you, BadAnalogyGuy? Monthly wage = money/time, and not time/money.
Re:price performance by maxwell+demon · 2006-04-24 01:35 · Score: 1

And price-performance = performance/price, and not price/performance?

--
The Tao of math: The numbers you can count are not the real numbers.
Re:price performance by poot_rootbeer · 2006-04-24 03:27 · Score: 1

They didn't say "price/performance ratio", they said "price-performance".

Which I would assume means price TIMES performance, or perhaps the dash should be taken literally as price MINUS performance.
Re:price performance by adrianbaugh · 2006-04-24 05:02 · Score: 1

That's why it costs $5k :-) The cost to them was, like, $200, but they just had to stick to that price/performance ratio...

--
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.

Cool, it's a distributed.net co-processor! by frinkacheese · 2006-04-23 23:58 · Score: 0

If it really delivers upto 10x the performance, it'll make a cool http://www.distributed.net/ node!

That'll be sure to get the team stats up!

Now, who do I call for a free demo?

Seti stats... by way2trivial · 2006-04-23 23:59 · Score: 0, Offtopic

previously I'd decided I was willing to pay a hunnert bucks per free slot on my machine for boards that could process boinc faster..

(they'd need fans though)

I'm in the top 3% worldwide.. and so are the 18,055 people above me.
And I don't believe I'll ever see top 1%

--
every day http://en.wikipedia.org/wiki/Special:Random

Re:Seti stats... by Morosoph · 2006-04-24 00:11 · Score: 1

I'm doing Folding@Home, myself. I don't mind spare capacity being used, but I was thinking more selfishly, I'll have to admit :o)

Flight simulation might be fun. Not just the graphics: air turbulance, AI for other aircraft, birds, etcetera...

--
Wikileaks, no DNS
Re:Seti stats... by RecycledElectrons · 2006-04-24 05:48 · Score: 1

> I'm in the top 3% worldwide.. and so are the 18,055 people above me.
> And I don't believe I'll ever see top 1%

I also doubt that you'll ever see the top 1%.

If there are six billion (6,000,000,000) people in this world, then the top 3% is one-hundred-and-eighty-million (180,000,000.)

Andy Out!
Re:Seti stats... by wed128 · 2006-04-24 12:40 · Score: 1

I think he means top 3% of people who are donaters to the SETI project

Er.... question by brunes69 · 2006-04-24 00:00 · Score: 1, Interesting

"DRC's flagship product is the DRC Coprocessor Module that plugs directly into an open processor socket in a multi-way Opteron system," the company notes on its web site.

If you have an open Opteron socket on your multi-way box, wouldn't you probably achieve better performance by shoving another Opteron into there?

I mean, sure, I can see the benefit of having a co-processor customized to handle your specific workload. But another Opteron would likely run at multiples of the clockspeed of that thing, and it would also be able to offload work from the *othewr* Opterons, such as disk I/O etc, giving your overall application more performance.

Re:Er.... question by Anonymous Coward · 2006-04-24 00:03 · Score: 0

Yeah, I'm sure they never thought of that, and are reading your comment and going "Doh!" right now.
Re:Er.... question by kinnell · 2006-04-24 00:23 · Score: 3, Informative

another Opteron would likely run at multiples of the clockspeed of that thing, and it would also be able to offload work from the *othewr* Opterons, such as disk I/O etc, giving your overall application more performance.
Clockspeed is not a measurement of performance unless you are comparing similar architectures. With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
In practical terms, this product lends itself to compute intensive tasks such as signal processing, not data serving.

--
If I seem short sighted, it is because I stand on the shoulders of midgets
Re:Er.... question by brunes69 · 2006-04-24 00:41 · Score: 2, Informative

With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
True, but if the microprocessor's clock speed is hundreds of thousands of times fater than the FPGA, then you are even again. There's no clock speed for this device in the article so we can't really compare.
Re:Er.... question by andrewmc · 2006-04-24 01:43 · Score: 3, Interesting

True, but if the microprocessor's clock speed is hundreds of thousands of times fater than the FPGA, then you are even again. There's no clock speed for this device in the article so we can't really compare.

Clock speed often depends on the circuit design put onto the FPGA. If you got your FPGA design running at even 100MHz (not unrealistic), you're maybe 30 times behind a general-purpose CPU. But not only are you running hundreds of instructions per cycle, but those instructions are specific to the application and probably many times more efficient.

It's probably not useful for making short-lived applications faster, but for seriously repetitive number-crunchy work like weather predictions, oil drilling, etc, where there are trillions of small-scale computations, the highly-parallel nature of the FPGA has great potential.

Also, if those small-scale computations need to interact for any reason, on-chip communication is far faster than any chip-to-chip could be. And that's happening in parallel, too.
Re:Er.... question by kinnell · 2006-04-24 01:46 · Score: 3, Informative

The Virtex 4 FPGAs can be clocked at up to 500MHz, so we are talking about ~10-15 times slower than the processor, depending on the application. Even a simple digital filter would be faster when implemented in the FPGA, and this would only take a small fraction of the FPGA resources.

--
If I seem short sighted, it is because I stand on the shoulders of midgets
Re:Er.... question by ChrisA90278 · 2006-04-24 06:00 · Score: 1

"f you have an open Opteron socket on your multi-way box, wouldn't you probably achieve better performance by shoving another Opteron into there?" No. Let's assume you are computing an FFT on some data and the Operon CPU can do (say) 100 per second. Adding a second CPU would at beat get you up to 200 per second. But hardware FFTs can go maybe 10X faster than software FFTS so with the new chip you can do 1100 per second. With a general purpose CPU there needs to be balance but if you know you will be computing FFTs you might build 16 or 32 hardware multiply units on the chip and you can leave off just about everything else. The chip only needs to be 3X faster then a CPU to look good but for specialized applications 10X is not hard to get.

The new thing is the hype(r)transport by nietsch · 2006-04-24 00:02 · Score: 3, Insightful

There are plenty of others that have tried this, and plenty of them failed. A FGPA does have a significantly slower clockspeed and you need to have fairly sophisticated software that can make most of the flexible design. Before this thing came out in most instances it turned out to be cheaper to buy more horsepower and staying on a regular hardwareprogramming path than to risk it with special hard and software.
These guys claim their stuff is cheaper than more horsepower and that you get the extra speedboost from the hypertransport (over pci).
It clearly is a pr-release that has been regurgitated by a lazy journalist, as I found no or few critical notes, something this product might deserve. for one thing I don't see how they have solved the special software & programmers problem or how they really have taclked the economics of scale: this thing costs a couple of grands, vs a couple of hundres for a amd top notch processor. the regular processor has double cores and runs an order of magnitude faster than the fpga. The scarecity of programmers that can write software for this thing adds another order of magnitude to the wrong side of the equation.
Roughly, the fgpa solution must be a thousand times quicker/better than the regular-proc-with-lots-of-horsepower solution. I don't see that happen soon.

OTOH, the rosy images of a computer that can render a pixar animation in a few minutes the next mintes be used as a realtime sound-processing thing or simulate a neural net with as much neurons in it as in the human brain, that makes the geek in me drool. Computer, tell me it isn't so!

--
This space is intentionally staring blankly at you

Re:The new thing is the hype(r)transport by wgaryhas · 2006-04-24 02:17 · Score: 1

A top of the line AMD opteron costs about 2 grand if it supports 4 to 8 processors. the top of the line 2-way and single processors cost 1 grand.

--
"For every complex problem, there is a solution that is simple, neat, and wrong." - H.L. Mencken
Re:The new thing is the hype(r)transport by elvum · 2006-04-24 08:48 · Score: 1

I'd like to see an Open Source solution that greatly facilitates the creation of FPGA firmware. Perhaps something that could compile/synthesise one of the many excellent modern functional languages. Anyone out there up to the challenge? :-)

High end gameing? by Barny · 2006-04-24 00:03 · Score: 3, Interesting

Even though I only know of 3 people that use 940 socket machines for gameing (2 of them dual cpu rigs) I believe an ageia physX processor modded to the socket would be a good idea. The combination of extremely fast cpu-ppu bus combined with being able to use stock (well, reg ecc ram is kinda stock) ram to feed it would help to make multi socket opterons a very viable gameing platform, although as those 3 peeps (and me after seeing the BoM) know, it would not be a cheap one.

--
...
/me sighs

neural networks or java? by Janek+Kozicki · 2006-04-24 00:06 · Score: 3, Interesting

I'm not a fan of java, but imagine JVM programmed into such co-processor on the hardware level (just as it is capable to). I bet it will be a very interesting option for some people. Servers running on java, anyone?

But I'm a fan of neural networks, and I imagine that if such coprocessor was programmed exactly to perform NN tasks it could bring "brain simulation" a few steps closer - especially if many such coprocessors were put into the system.

--
# #\ @ ? Colonize Mars #

Re:neural networks or java? by Anonymous Coward · 2006-04-24 00:23 · Score: 4, Informative

Java co-processor: it has been tried before, with negative success. Main reason: it turns out that compiling byte-code to CISC CPU assembler and running the native code gives more speed than executing byte-code directly.
In late 90's, I've been burned off in precisely such start-up. We built an ASIC Java piggy-back byte-code CPU. It worked... as a proof of an idea. It didn't give much performance boost, at best, in 20-30% range. Noone wanted it.
Re:neural networks or java? by Zemplar · 2006-04-24 00:45 · Score: 1

"...Servers running on java, anyone?"

Why not? This sounds perfectly complementary to me since most Sys Admins also run on java.
Re:neural networks or java? by arivanov · 2006-04-24 01:03 · Score: 1

java, anyone?
Azul does that, but it is a fully specialized hardware. No idea if you can take their core unit and transplant it into an Opteron socket.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:neural networks or java? by jamesh · 2006-04-24 01:19 · Score: 1

Just out of curiosity, how did the power usage and heat compare to the alternative?
Re:neural networks or java? by Anonymous Coward · 2006-04-24 01:34 · Score: 0

We didn't shoot for power efficency. It was a pre-bubble start-up, hardly anyone cared about power and/or heat, at least in our niche. We had two models, the one with most performance had liquid cooling.

What about sdram slots by mpn14tech · 2006-04-24 00:08 · Score: 1

I had never thought about using a hypertransport connection to get an fpga connected to a cpu, but I had often wondered about fitting an fpga into an sdram socket. You just write your block of data out to memory and read the results back.

Re:What about sdram slots by Brietech · 2006-04-24 04:33 · Score: 2, Interesting

There is actually a lot of active research going on in this field right now. It is called "Processor-in-memory" architecture, and it's best for handling things like array-based calculations, where you need to make a number of off-chip memory calls to complete. Staying completely on-chip makes it much faster, and it allows the embedded proc to take advantage of the internally wide (~256-bit) data path of modern memory. Look up Project DIVA and Project MONARCH, it is all DARPA-sponsored research, but the university I attend (USC) has a number of researchers involved with it.

--
I'm perfect in every way, except for my humility.
Re:What about sdram slots by Jan · 2006-04-24 04:40 · Score: 1

It was done several times, for example, Nuron AcB, Pilchard, and SmartDIMM.

I've an idea! by the_humeister · 2006-04-24 00:09 · Score: 1

How about programming it as an x86 processor and then booting from it? That would be pretty interesting.

Fair points by Morosoph · 2006-04-24 00:16 · Score: 3, Insightful

However, it's still a middle-ground between non-progammable dedicated hardware and another CPU.

Also, power consumption matters if you've got a rack of these things in a small space and need to keep them cool. Five times as many systems might need a larger server room.

--
Wikileaks, no DNS

Open protocols win! by maxwell+demon · 2006-04-24 00:19 · Score: 4, Insightful

I think the most important sentence in the article is this:

AMD's decision to open Hypertransport could end up being a key factor in Opteron's future success.

--
The Tao of math: The numbers you can count are not the real numbers.

Analog data analysis and general calculus, IMO by CFD339 · 2006-04-24 00:31 · Score: 4, Insightful

The sweet spot for plug in like this, IMO, would be similar to what you see a few board manufacturers doing now -- digital signal processing routines like Fourier transforms and other general calculus functions that are used in all kinds of data analysis where raw data comes in as analog variations, or where the moment by moment changes in state need to be modeled for engineering applications like fluid dynamics and harmonics.

I'd imagine you'll need to have the application compiled in such a way that it is aware of the additional processing capability, so its not likely to be a plug-n-pray solution to your general game player's graphical wet dreams.

--
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln

About Time! by evilviper · 2006-04-24 00:39 · Score: 4, Interesting

I have to say, I'm surprised it has taken so long. Seems a few years past-due, IMHO.

One of first signs that PCs needed an FPGA or similar was hardware MPEG capture cards... They could do the job so much faster, and so much cheaper than your primary CPU, that the alternative is disappearing.

High-end graphics cards have been the most telling development. It's not that OpenGL is something magical, it's just that an ASIC can do many things so much better than a CPU that transfering much, much more raw data over the bus was still cheaper than actually processing it (despite the fact that interrupts are rather costly themselves).

PS2 clusters, Crypto cards, Hardware-accelerated NICs, SLI, all are a symptom of almost excatly the same problem...

The rising popularity of GPU programming made it extremely clear that there is a vaccuum here. Using the videocard isn't a very good method to accomplish this, just a stop-gap necessity. I thought from the beginning that FPGAs would become like the old math-coprocessors, and have their own motherboard socket, but neither AMD nor Intel were stepping up to fill this clear need. Installing it into a normal CPU socket, to get around this appathy, is a very clever hack I hadn't thought of.

I expect, with popularity, it will be cheaper to put a custom FPGA socket on motherboards, rather than building a full-fledged SMP motherboard for the purpose. After that, who knows... Maybe FPGAs will go the way of the math-coprocessors and get itegrated into future CPUs.

I know if I was running ATI or NVidia (or Hauppauge, or Level5), I'd be very worried about this thing eating the most profitable segment of my market.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Re:About Time! by TheRaven64 · 2006-04-24 00:55 · Score: 4, Interesting
I was at a talk by Bob Colwell a few weeks ago. One of the points he made was that within the next ten years we will be able to (economically) fit far more transistors on a chip than we realistically know what to do with. His example was using all of that space to have a vast array of P6 cores. If you did this, then:
1. You would not be able to get enough power to the chip to make it work.
2. You would not be able to dissipate the heat that it would draw.
3. You would not be able to get enough data to it for more than about 10% (on a good day) of the chip real-estate to be actually doing anything.
One possible solution is to have a hundred or so general purpose cores, and fill the rest up with simple algorithmic accelerators (e.g. FFT, crypto, [i]D{C,W}T, etc). These would spend most of their time turned off (not using power), but when a workload hit the chip that needed them they could be turned on to give a significant performance boost.
--
I am TheRaven on Soylent News
Re:About Time! by afidel · 2006-04-24 03:16 · Score: 1

How do you tackle leakage current? Even if part of a chip is not in use it still uses power in current designs, and AFAIK not even clockless designs completely eliminate parasitic current loss in unused components.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:About Time! by Gyorg_Lavode · 2006-04-24 04:15 · Score: 2, Informative

They only leak if you power them. Leakage current is for transistors that are not changing states but are powered.

--
I do security
Re:About Time! by DigiShaman · 2006-04-24 17:54 · Score: 1

Excuse my ignorance, but I ask this question from the "layman" perspective.

One possible solution is to have a hundred or so general purpose cores, and fill the rest up with simple algorithmic accelerators (e.g. FFT, crypto, [i]D{C,W}T, etc). These would spend most of their time turned off (not using power), but when a workload hit the chip that needed them they could be turned on to give a significant performance boost.

Isn't that what Intel and AMD have been doing all along with MMX, SSE, SSE2, and SSE3. Basically; revisions of adding more and more instruction sets to make processing more efficient? I'm sure as more die-space is available, future versions of SSE (versions 10, 50, 100...whatever) will include specific instruction sets at processing the very algorithms you mentioned.

--
Life is not for the lazy.
Re:About Time! by TheRaven64 · 2006-04-24 21:45 · Score: 1

MMX, SSE, 3DNow! and AltiVec are all still very general purpose instructions. They are just like the standard instructions that the chip can execute, except that they work on several inputs at once (e.g. do 4 adds in parallel instead of just doing one at a time). It's possible to build things like FFT quite efficiently out of these, but you are still using general purpose hardware. An FPGA running at 100MHz or so can still outperform a high-end general purpose CPU doing several of these tasks, because it can do them in a single cycle (throughput, not latency - they are heavily pipelined).

--
I am TheRaven on Soylent News

An old method, not really suitable nowadays by Flying+pig · 2006-04-24 00:41 · Score: 4, Interesting

Worked fine in the days of embedded systems when all memory was static (and usually only 16 bits wide), also when it was easy to wire an interrupt line so when the add-on had finished you could read the results. Nowadays much more difficult because of the need to integrate with DRAM controllers and timing, absence of convenient interrupts ( so need to poll a location to see when it completed). Whereas Hypertransport is designed to do the job and do it efficiently.

Another nice approach was the "swinging gate" RAM method in which you had two blocks of physical RAM in the same memory space. The main CPU filled one block with data, then flicked the switch so the co-processor could read that data while the CPU read the results from the other block, then put in new data for processing in the next cycle. Very easy to implement, much cheaper than FIFOs. It meant you could use a cheap DSP (from TI) in a system using a cheaper 8086 series processor for which you could get cheap tools and an embedded OS.

--
Pining for the fjords

Memory bank? by Spazmania · 2006-04-24 00:42 · Score: 1

On opteron motherboards each processor manages its own bank of memory and makes it availble to the other processors via the hypertransport. Since this FPGA replaces one of the processors, how does it manage the associated memory bank?

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.

Re:Memory bank? by Zemplar · 2006-04-24 00:54 · Score: 1

Just a guess, but after RTFA, the include photo shows and Opteron CPU with populated memory banks and the DRC product's memory banks empty. Perhaps this is a hint?
Re:Memory bank? by SteveCasselman · 2006-04-26 10:44 · Score: 1

Since each Opteron has its own memory controler the Dram pins come directly into the DRC module. This gives us direct control over the attached memory. The module is just memory mapped into the processors memory space. The user has an area where they plug in their logic and you read and write into this area via some slice of the memory map. Steve Casselman, CTO DRC

You mean Like the new Cray? by jbossvi · 2006-04-24 00:46 · Score: 1

yes this has been done before (different socket for sure). Most of them have failed. But this is getting picked up by others lately and seems to have legs (technologically speaking).

http://www.cray.com/products/xd1/index.html

oh BTW a single 3U is around $45k. For certain memory bound calculations and some sequential algorithms, HFFPGA work well (high frequency FPGA).

Comrades! by Anonymous Coward · 2006-04-24 01:17 · Score: 0

a reprogrammable co-processor

In Soviet Russia, co-processor reprograms you!

Re:Comrades! by maxwell+demon · 2006-04-24 01:39 · Score: 1

Yes, but does it run Linux?
Well, imagine a Beowulf cluster of them ...

--
The Tao of math: The numbers you can count are not the real numbers.

They didn't say "price/performance ratio" by p3d0 · 2006-04-24 01:18 · Score: 1

I think "price performance" would literally be the performance of the item's price. That is, how much performance each dollar of the price provides.

As for why they hyphenated it, I can't answer that one...

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

386 DX? by Anonymous Coward · 2006-04-24 01:42 · Score: 1, Interesting

Does anyone remember the gool ol' days?

I still have my 386/40MHz + coprocessor.

And yes, AMD have called me to come in to their lab with my ancient relic about a year ago.

Weitek? by Anonymous Coward · 2006-04-24 01:46 · Score: 1, Interesting

Do you remember the Weitek math co-processor (i386-era stuff). It disappeared quite completely.

Also there is a big fear of specialized hardware accelerators because they could be rigged in silicon, which you will never find out. With the functionality implemented in software on generic purpose CPU you at least have a chance to audit the code to find out if the SSL handling has some NSA backdoor added or so. You buy a Chrysalis Luna VPN booster PCI card and assuredly know Mossad reads whatever you transaction. Never ever use hardware accelerators, they are salted to suit the secret services' taste. General purpose CPU and open source code is the only true safe way of wisdom.

Re:Weitek? by zippthorne · 2006-04-24 06:17 · Score: 1

Why can't the cpu be similarly "salted?"

--
Can you be Even More Awesome?!

Choosing AMD Platform by sam0737 · 2006-04-24 01:47 · Score: 1

I think why they implement it as an Opetron is that besides the hypertransport thing, they have their own exclusive set of memory. The co-processor don't have to share this with the system, making algorithm a lot easier (a big continuous chunk of memory for matrix operation!) and design would be much simplier, no MMU whatsoever.

Ad? by Anonymous Coward · 2006-04-24 02:04 · Score: 0

Is it just me, or did that whole article seem like one big ad?
That was a blatant marketing ploy.

There is nothing really 'new' about this technology, the only thing this product claims to do is improve performance by offloading specific tasks, so how is this any different then the GinSu Knives claiming to cut bread easier(or shoe) all for the low price of $30!@?#$$!@

No digg... oh wait, this is /.

You made my day by Anonymous Coward · 2006-04-24 02:11 · Score: 0

"but imagine JVM programmed into such co-processor on the hardware level"

Sounds familiar, this idea, it was proposed to speed up LISP to, and we know it didn't make LISP more popular. One probably would do better in finding out whether RISC or CISC and whether many registers or few registers were more efficient for JAVA+JVM to work with.

"I imagine that if such coprocessor was programmed exactly to perform NN tasks"

Or you could just get one of those NN cards that are commonly available .. or just any FPGA ..

But then, I doubted that pigs can fly too up until the day somehow did it.

FPGA vs. General Purpose CPU by DeadCatX2 · 2006-04-24 02:38 · Score: 4, Informative

Lots of other comments have made clear the point that it's not easy to program this kind of hardware. Typical software programs run in a very sequential manner. In fact, trying to get cooperative parallel execution of threads is known to be a major sticking point in the average programmer's education.

Hardware, on the other hand, is massively parallel. All the "gates" (*) are all running all the time. It's like multi-threading a program, taken to the limit of infinity. However, if designed correctly, this thing can scale beyond belief, since it's all parallel.

It's also important to note that it's a Virtex4 on that card. That's one hell of an FPGA, they sure aren't cutting any corners. I'm not sure which one they're using, but some Virtex4 chips have PowerPC processors at 450 MHz.

This is definitely a niche product for now, due mainly to the lack of people who can write code in Hardware Description Languages (HDLs). But if you can figure it out, and you have an application that works on a massive scale, this may be for you.

Oh, and for all you detractors who are saying "that thing only runs at 500 MHz! How is it supposed to be faster than my 2 GHz AMD chip?" You're forgetting one very important factor. Your AMD chip executes one instruction at a time, and the important instructions are surrounded by instructions whose sole purpose is to control program flow or move data back and forth. However, the XtremeDSP slices of a Virtex4 can each execute a multiply and an add in a single cycle, and there are up to 512 of them in the most hardcore Virtex4 chip, and other logic executing in parallel can control the "program flow" and ferry data back and forth across the bus.

*: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.

--
:(){ :|:& };:

Re:FPGA vs. General Purpose CPU by TeknoHog · 2006-04-24 07:13 · Score: 1

*: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.
What a relief for the Linux crowd! We no longer have to imagine a Beowulf cluster of Vistas.

--
Escher was the first MC and Giger invented the HR department.

Might we ever have socketed Hypertransport GPU's? by Dr.+Spork · 2006-04-24 03:06 · Score: 4, Interesting

The fact that this is practical has made me wonder how well it would work to use a motherboard socket for a GPU. With Hypertransport it would have absolutely direct access to system ram and could help itself to as much as it needed. I would love to be able to use standard CPU heatsinks on a GPU.

But what I find really exciting about this idea is that once the GPU is in the motherboard, I'm sure programmers would find an easy way to use all that logic to do calculations - say, media encoding. Heck, I know they are trying to do this with GPU's on cards, but this would be a much lower latency connection.

I wonder how this would affect total system cost. I mean, I know multi-socket mobos will always cost more, but then again, when the GPU is a chip instead of a card, that should bring costs down. Also, they could ditch all that PCI-e logic and those slots. Upgrading would definitely be cheaper, and can you imagine two socketed GPUs on the mobo running a Hypertransport version of SLi? That might be the fastest, quietest gaming rig ever!

Hypertransport for general-purpose expansion? by Sloppy · 2006-04-24 03:40 · Score: 1

Who needs PCI-X and the like, when you could just plug your graphics coprocessor (and other things like that) into Hypertransport? Maybe some day we'll all be using motherboards with lots of socket 940s instead of traditional expansion slots.

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Cray XD1 has something similar by jaywee · 2006-04-24 04:05 · Score: 1

Btw, it should be noted that Cray has something very similar in their Cray XD1 - http://www.cray.com/products/xd1/acceleration.html But the virtex is connected via their proprietary interconnect.

10x - 20x performance? You betcha. by toybuilder · 2006-04-24 04:28 · Score: 2, Informative

A dedicated co-processor with enough registers to perform a complex calculation without having to constantly ferry register values between memory and the processor, combined with the ability to run several calculations simultaneously will blow the socks off a general purpose CPU for *very specifically designed algorithms*.

There's a market for GPU's on video cards running $1,200+... People that buy them won't be satisfied with standard GPU's no matter how fast their main processors run... The custom acceleration of graphics calculation makes it worthwhile.

Now, imagine doing massive calculations (think three blackboards filled with quantum physics equations) -- and you can see how some scientific/industrial applications would go ga-ga over this stuff...

On FPGAs as PC coprocessors -- latency rules by Jan · 2006-04-24 04:28 · Score: 3, Interesting

See earlier postings and blog entries on this concept:

http://www.fpgacpu.org/usenet/fpgas_as_pc_coproces sors.html
http://www.fpgacpu.org/log/aug01.html#010821-dimm

The latency to the FPGA fabric largely determines what kinds of coprocessing workloads are feasible.

When hypertransport came out, we (FCCM'ers) knew a HT-based lower latency interconnect should be possible. (Though I wouldn't call 75 ns +/- "low" latency -- that's a couple of hundred instruction issue slots, or a bit more than 1 full cache miss.) But DRC has gone and done it. I love the way it (apparently) just drops in and can even use that socket's DRAM DIMMs. Congrats to Steve Casselman and co.

Re:Might we ever have socketed Hypertransport GPU' by bfree · 2006-04-24 04:34 · Score: 1

Another requirement then would probably be to have a second bank of ram slots for "Video" ram ... though this could be a great thing if the gpu becomes generally available as a co-processor as presumably your faster video ram could be used for aonther level of caching? It would also mean you could upgrade your gpu and video ram seperately. The only downside I think I can see is that your video output's are likely to be tied to your motherboard though perhaps that's where pci-* steps back in (so you can pop in a video or audio card which is no more then inputs/outputs using the onboard dsp/ram)? Only thing is this starts to sound like winmodems so getting it all working solidly could be a challenge.

--

Never underestimate the dark side of the Source

Look! Data Flow! by JumpingBull · 2006-04-24 05:07 · Score: 2, Insightful

As a cranky engineer, I find this ... sweet.
The best phrase to help the system design effort is data flow.
How does the machine chop up the task for the most performance?
The major problem in design is finding where to place the dotted line that says "cut here". Software mavens know this as refactoring, or partitioning.
The gotcha in development would be to ignore the internal architecture of the FPGA.
As a word of advice to the beginner, look carefully at the FPGA data flow, and try to decompose the algorithm ( or find a similar one) so that the data manipulation and movement fits the part as best as possible.
Just having an HDL is not enough, the neophyte hardware designer can easily write code that cannot be synthesised to work, let alone fit the part. A sensitivity to the underlying hardware is needed.
As an example of this, using hand crafted hardware design, Chuck Moore wrung several times the expected clock performance for a hardware Forth engine. A starting point for reading might be:
http://www.ultratechnology.com/cowboys.html
Using hand-crafting, you can get enormous processing gains, but the hardware and system designs have to be well understood.
Perhaps the GNU uber-geeks could handle the translation efforts to make a tool for the average application programmer, but until then the brave soul who tackles these efforts should be prepared to learn a lot of the edges of computer science, hardware, and system design. It's not a horrible job, just long. And the problem should be worthy of the efforts needed.

--
This is progress?

Re:Might we ever have socketed Hypertransport GPU' by FuturePastNow · 2006-04-24 05:46 · Score: 1

Probably not, because only expensive server processors like the Opteron and Xeon can be used in multi-CPU systems. While there are a few gamers who use the Opteron- probably they just like to spend money- they don't have the volume to justify producing such a thing. It would cost $5000 like the FPGA in the article, and probably not be updated as often as regular GPU's.

Now, a programmable co-processor on a PCIe x16 card... I'd like to be able to encode a movie in five minutes.

--
Give a man fire, and you warm him for the night. Set a man on fire, and you warm him for the rest of his life.

Re:So... - I just want to add an insane amount of by Anonymous Coward · 2006-04-24 06:01 · Score: 0

I just want to use the FPGA to add an insane amount of memory and present this to the other cores over HT. It seems some most people missed out on this possibility.

Optimize Audio by ElitistWhiner · 2006-04-24 06:09 · Score: 2, Interesting

New polyphonic software instruments rely on CPU cycles. More cycles sound not only better but much different. Musicians are at a tipping point at this moment in time. Old fashioned instruments which are standards on stage and tour are becoming brittle and expensive. Collectors are snapping up the old instruments at prices north of $5K USD reducing the availability of instruments for playing professionals. The Hammond B3's are going as high as $16K. Selmer Mk6 saxes $6K.

Software instruments are a necessity going forward. Its imperative to find a scalable system that is state-less and transparent to the performance.

Re:Might we ever have socketed Hypertransport GPU' by evilviper · 2006-04-24 06:18 · Score: 1

But what I find really exciting about this idea is that once the GPU is in the motherboard, I'm sure programmers would find an easy way to use all that logic to do calculations - say, media encoding.

Now I'm confused. This sounds about like someone saying: "Now that they've got hybrid technology in cars, they should put it in trucks. Then we can take the trucks and make them smaller by removing the truck bed, and put more seats in. Maybe even put a car body on it..."

What do you think this FPGA is for, exactly??? It's designed to do exactly those kinds of calculations you want to do with a repurposed GPU... Except, of course, you're raising the difficulty level significantly by doing that.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

If you're going to require socket 940 platforms... by Junta · 2006-04-24 07:37 · Score: 1

Might as well make it an HTX card rather than suck up a CPU socket that could otherwise be driving more memory (picture implies no attached memory). Most upcoming 940 platforms have HTX slots now for the equivalent performance/latency with a left over hypertransport links.

--
XML is like violence. If it doesn't solve the problem, use more.

XD1000 is a similar concept product by Anonymous Coward · 2006-04-24 07:44 · Score: 0

Yet a different startup has created a similar product. The XD1000 from XtremeData Inc. also plugs into an Opteron 940 socket and has an FPGA on it, but an Altera Stratix II instead. Again, the openness of Hypertransport allows full speed access to the other Opteron and also to memories, etc.

See http://www.xtremedatainc.com/Products.html

I imagine there will be a patent battle in the future, or one will take over the other....

Not an employee of XtremeData...

Re:Might we ever have socketed Hypertransport GPU' by elvum · 2006-04-24 08:34 · Score: 2, Insightful

I think the parent was suggesting that CPU-socket devices could be produced that were marketed as GPUs but could be used to assist other CPU-bound processes. Whether or not said devices are designed as graphics chips or general-purpose logic devices is another question.

WRT your vehicular analogy, there are people who buy cars and want to use them as trucks occasionally, and people who buy trucks but sometimes just use them as cars. It's no big deal.

Re:Might we ever have socketed Hypertransport GPU' by flaming-opus · 2006-04-24 09:16 · Score: 1

The advantage of pcie is that the specification will likely stick around for 6-7 more years, and will be used on both intel and AMD systems, as well as some of the more obscure architectures. You can design your graphics card and know that just about anyone will be able to throw is into their box. It gives you a very large addressable market. putting the GPU into a processor socket cuts your addressable market down to only AMD boards.

Furthermore, GPU's do require a fair amount of bandwidth, but are a lot more latency tolerant than a processor. PCIe is really suited to the job, I don't see why you'd want to supplant a slot designed for I/O boards with a socket designed for something else.

Reading from the article... for the PARANOID by NRAdude · 2006-04-24 10:01 · Score: 0

I thought it quite odd to have a special-purpose (closed) and proprietary Processor installed in a socket for a trusted general-purpose monolithic CPU, and can in effect copy and divert data through a subsystem by means of eavesdropping on the local host and the input at its console. Remember folks, the modern BIOS has been created to allow remote booting, as well as firewall software has been limited to regulate communications of software only over IP. Given the advent of Wireless technology, that is yet to provide a BIOS standard for remote boot, but I've heard of situations to write all drivers to BIOS for embedded systems. Considering a "trigger", why would someone ask for intelligence from an agent of a company or corporation named "Illuminata"? CPUID comes nowhere close to what that special-purpose Processor in a general-purpose interface to intercept the BUS. Consider this quote from the article,

"People have tried a lot of special purpose processing devices over the years and, with the exceptions of graphics units and arguably floating point units, general purpose processors have always won out in the end," said Gordon Haff, an analyst at Illuminata.

To query the intellect of one of the highest qualified experts on Security of computing sysmtes: What Would Theo de Ratd Say?

--
without prejudice

This had allready been done so it seems ... by Jarth · 2006-04-24 10:11 · Score: 1

http://www.xtremedatainc.com/Products.html

Shows a product very similar to the drccomputers product. I'm untechnical and as such not able to supply any valuable comments. But i'd be glad to hear from you.

--
free dom(inion) - free energy - free your mind - whee!

A thousand times faster - it's already here. by Anonymous Coward · 2006-04-24 14:39 · Score: 0

While you're right about the "programming" problem (lack of standard tools, many fewer skilled HDL programmers), the capabilities are there today. I saw a demonstration of a $50 FPGA that took and image from a CCD camera, ran an edge-detection algorithm on it, mapped the image onto the faces of a cube, and displayed that cube rotating in 3-D space. The FPGA was being clocked at 50MHz.

The same algorithm was being done on a 1.6 GHz Pentium M and the result was a significantly slower rotation/update rate.

Up-clock the FPGA to 200 MHz (even the low-cost FPGAs can do this) and replacate the design to do eight cubes (cut & paste the HDL), and you've got your three orders of magnitude performance advantage.

Re:A thousand times faster - it's already here. by gfody · 2006-04-24 17:55 · Score: 1

what was the resolution on this image? the poor performance on the 1.6ghz pentium could be inefficient x86 code. I made a 3D texture mapped spinning cube that ran at 30fps on my old 100mhz 486dx4. The resolution of the image was 256x256x8bpp

--

bite my glorious golden ass.

Re:If you're going to require socket 940 platforms by GaryX · 2006-04-25 07:36 · Score: 1

An HTX Card? How do you get 30 Watts of power to it? How do you cool the FPGA? Where do you put the 8GBytes of DRAM available to the Opteron socket? How do build a single board that will fit in the form factor available in most Opteron systems? Tough questions. And then there are places like ATCA systems that have no plug in slots at all. XtremeData's XD1000 fits in all dual Opteron systems. Gary Hardware Architect XtremeData Inc www.xtremedatainc.com

Re:If you're going to require socket 940 platforms by Junta · 2006-04-25 10:57 · Score: 1

30 Watts could be clunkily done via a direct attach cable from PSU. Cooling probably is a fair issue, most accelerators in the market today don't have that much heat to dissipate, therefore meeting PCI-spec like requirements with a cooler is an issue for 1U systems that don't allow multi-width cards at all. Conjecture had been that the DIMM sockets were not being utilized, and that was my chief concern about a device sitting in that spot and crippling the memory capacity. HTX slot is fairly standardized and becoming more prevalent in the market, so building a single board isn't that much of a challenge, though some 940 platforms admittedly do not have that slot.

The other issue is how does it deal with other devices that may be connected via hypertransport to the socket it occupies? For example one strategy I see pursued to get more inter-node IO is to have, for example, a four socket opteron where each socket has two links to two processors and the third brought out for inter-node IO via PCI-E chips or HTX.

It could be interesting compared to clearspeed offerings, but the platforms I work with have moved on to AM2 for the future...

--
XML is like violence. If it doesn't solve the problem, use more.

Re:If you're going to require socket 940 platforms by GaryX · 2006-04-25 23:41 · Score: 1

The DRAM is actually the most important issue. We use all the available DRAM DIMMs on the motherboard, so no system resources are wasted. This is a very nasty problem for plug boards to deal with.

Slashdot Mirror

Start-up Could Kick Opteron into Overdrive

127 comments