Start-up Could Kick Opteron into Overdrive

← Back to Stories (view on slashdot.org)

Start-up Could Kick Opteron into Overdrive

Posted by ryuzaki0 on Sunday April 23, 2006 @11:36PM from the true-niche-markets dept.

An anonymous reader writes "The Register is reporting that a new start-up, DRC Computer, has created a reprogrammable co-processor that can slot directly into Opteron sockets. This new product has the potential to boost the Opteron chips well ahead of their Xeon-based competition. From the article: 'Customers can then offload a wide variety of software jobs to the co-processor running in a standard server, instead of buying unique, more expensive types of accelerators from third parties as they have in the past.'"

21 of 127 comments (clear)

Min score:

Reason:

Sort:

Berkeley by 2.7182 · 2006-04-23 23:38 · Score: 2, Interesting

I thougt they had done this out at Berkeley a while back. Is it really a new thing ?
1. Re:Berkeley by Whiney+Mac+Fanboy · 2006-04-24 00:00 · Score: 4, Interesting
  
  I'm sorry but 5k for a little chip that makes my opteron a little faster? I could just buy another opteron for that price: http://www.pricewatch.com/cpu/419325-1.htm> The price is supposed to drop to 3k next year.
  
  You're quite right that these are not for you - their to run highly specialised calculations (the oil & gas industries are mentioned in TFA).
  
  They make some operations much faster (think of a hardware mpeg decoder, useless for most things, but much more efficient for the single thing it can do then a general purpose CPU)
  
  How does this affect cooling?
  
  These things consume 10-20 watts compared to an Opeteron's 80, so it's affect on cooling is minimal (far less then adding the second opteron that you propose)
  
  --
  There are shills on slashdot. Apparently, I'm one of them.
2. Re:Berkeley by drgonzo59 · 2006-04-24 02:17 · Score: 2, Interesting
  
  A fast FFT processor, for example, would make the life easier for a lot of Photoshop filters users (with the help of special drivers and plugins), it would also help the GNU Radio quite a bit, as well as other multimedia/signal/data processing applications.
Kick ass synth? by Max+Romantschuk · 2006-04-23 23:43 · Score: 3, Interesting

This could really be an interesting way to boost real time soft synths... Even with top of the line processors the more complex ones will bring a CPU to it's knees. Seems like a more sensible option compared to a DSP-filled expansion card. Too bad this thing is still a little on the expensive side for a viable market on the music software side.

--
.: Max Romantschuk :: http://max.romantschuk.fi/
1. Re:Kick ass synth? by Pulzar · 2006-04-24 02:06 · Score: 3, Interesting
  
  An FPGA does not make a very good DSP for the price. I suppose if it's one of the nicer ones from the Virtex series, you can get it to do DSP, but it won't be as good as the processor already in the PC.
  
  That's not true, at all. An FPGA will not be as good of a general-prupose DSP as a custom-made DSP, but it will still be better than a CPU -- even the low-cost Cyclone II comes with 150 dedicated multiplers coupled with embedded memory, so they can do parallel multiply/accumulate at 700+ MHz. And these are the low-end FPGAs...
  
  Now, if you're actually programming the FPGA using custom-designed circuitry optimized for the task you're workin on, the FPGA will work a lot better than a general-purpose DSP, and be way ahead of an even more general purpose CPU. That's why you don't see generic DSPs being used in heavy DSP work (say, in telcos), but custom and semi-custom ASICs, and FPGAs in smaller environments.
  
  --
  Never underestimate the bandwidth of a 747 filled with CD-ROMs.
Re:So... by BenjyD · 2006-04-23 23:53 · Score: 4, Interesting

The article mentions applications in gas and oil companies. I would guess that means things like:

- MINLP/MILP (Wikipedia article is a bit weak) and Branch and Bound optimisation for things like pipeline routing, well selection etc.
- fluid mechanics for pipeline design
- geological data-mining for finding reservoirs
Those kind of jobs can have runtimes measured in days and weeks, so an accelerator could make a real difference.
Re:So... by bhima · 2006-04-23 23:55 · Score: 2, Interesting

I would dearly love a cryptoprocessor and looking at the specs it doesn't look at that far away.

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Er.... question by brunes69 · 2006-04-24 00:00 · Score: 1, Interesting

"DRC's flagship product is the DRC Coprocessor Module that plugs directly into an open processor socket in a multi-way Opteron system," the company notes on its web site.
If you have an open Opteron socket on your multi-way box, wouldn't you probably achieve better performance by shoving another Opteron into there?
I mean, sure, I can see the benefit of having a co-processor customized to handle your specific workload. But another Opteron would likely run at multiples of the clockspeed of that thing, and it would also be able to offload work from the *othewr* Opterons, such as disk I/O etc, giving your overall application more performance.
1. Re:Er.... question by andrewmc · 2006-04-24 01:43 · Score: 3, Interesting
  
  True, but if the microprocessor's clock speed is hundreds of thousands of times fater than the FPGA, then you are even again. There's no clock speed for this device in the article so we can't really compare.
  
  Clock speed often depends on the circuit design put onto the FPGA. If you got your FPGA design running at even 100MHz (not unrealistic), you're maybe 30 times behind a general-purpose CPU. But not only are you running hundreds of instructions per cycle, but those instructions are specific to the application and probably many times more efficient.
  
  It's probably not useful for making short-lived applications faster, but for seriously repetitive number-crunchy work like weather predictions, oil drilling, etc, where there are trillions of small-scale computations, the highly-parallel nature of the FPGA has great potential.
  
  Also, if those small-scale computations need to interact for any reason, on-chip communication is far faster than any chip-to-chip could be. And that's happening in parallel, too.
High end gameing? by Barny · 2006-04-24 00:03 · Score: 3, Interesting

Even though I only know of 3 people that use 940 socket machines for gameing (2 of them dual cpu rigs) I believe an ageia physX processor modded to the socket would be a good idea. The combination of extremely fast cpu-ppu bus combined with being able to use stock (well, reg ecc ram is kinda stock) ram to feed it would help to make multi socket opterons a very viable gameing platform, although as those 3 peeps (and me after seeing the BoM) know, it would not be a cheap one.

--
...
/me sighs
neural networks or java? by Janek+Kozicki · 2006-04-24 00:06 · Score: 3, Interesting

I'm not a fan of java, but imagine JVM programmed into such co-processor on the hardware level (just as it is capable to). I bet it will be a very interesting option for some people. Servers running on java, anyone?

But I'm a fan of neural networks, and I imagine that if such coprocessor was programmed exactly to perform NN tasks it could bring "brain simulation" a few steps closer - especially if many such coprocessors were put into the system.

--
# #\ @ ? Colonize Mars #
About Time! by evilviper · 2006-04-24 00:39 · Score: 4, Interesting

I have to say, I'm surprised it has taken so long. Seems a few years past-due, IMHO.

One of first signs that PCs needed an FPGA or similar was hardware MPEG capture cards... They could do the job so much faster, and so much cheaper than your primary CPU, that the alternative is disappearing.

High-end graphics cards have been the most telling development. It's not that OpenGL is something magical, it's just that an ASIC can do many things so much better than a CPU that transfering much, much more raw data over the bus was still cheaper than actually processing it (despite the fact that interrupts are rather costly themselves).

PS2 clusters, Crypto cards, Hardware-accelerated NICs, SLI, all are a symptom of almost excatly the same problem...

The rising popularity of GPU programming made it extremely clear that there is a vaccuum here. Using the videocard isn't a very good method to accomplish this, just a stop-gap necessity. I thought from the beginning that FPGAs would become like the old math-coprocessors, and have their own motherboard socket, but neither AMD nor Intel were stepping up to fill this clear need. Installing it into a normal CPU socket, to get around this appathy, is a very clever hack I hadn't thought of.

I expect, with popularity, it will be cheaper to put a custom FPGA socket on motherboards, rather than building a full-fledged SMP motherboard for the purpose. After that, who knows... Maybe FPGAs will go the way of the math-coprocessors and get itegrated into future CPUs.

I know if I was running ATI or NVidia (or Hauppauge, or Level5), I'd be very worried about this thing eating the most profitable segment of my market.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
1. Re:About Time! by TheRaven64 · 2006-04-24 00:55 · Score: 4, Interesting
  I was at a talk by Bob Colwell a few weeks ago. One of the points he made was that within the next ten years we will be able to (economically) fit far more transistors on a chip than we realistically know what to do with. His example was using all of that space to have a vast array of P6 cores. If you did this, then:
  
  You would not be able to get enough power to the chip to make it work.
  
  You would not be able to dissipate the heat that it would draw.
  
  You would not be able to get enough data to it for more than about 10% (on a good day) of the chip real-estate to be actually doing anything.
  
  One possible solution is to have a hundred or so general purpose cores, and fill the rest up with simple algorithmic accelerators (e.g. FFT, crypto, [i]D{C,W}T, etc). These would spend most of their time turned off (not using power), but when a workload hit the chip that needed them they could be turned on to give a significant performance boost.
  --
  I am TheRaven on Soylent News
An old method, not really suitable nowadays by Flying+pig · 2006-04-24 00:41 · Score: 4, Interesting

Worked fine in the days of embedded systems when all memory was static (and usually only 16 bits wide), also when it was easy to wire an interrupt line so when the add-on had finished you could read the results. Nowadays much more difficult because of the need to integrate with DRAM controllers and timing, absence of convenient interrupts ( so need to poll a location to see when it completed). Whereas Hypertransport is designed to do the job and do it efficiently.
Another nice approach was the "swinging gate" RAM method in which you had two blocks of physical RAM in the same memory space. The main CPU filled one block with data, then flicked the switch so the co-processor could read that data while the CPU read the results from the other block, then put in new data for processing in the next cycle. Very easy to implement, much cheaper than FIFOs. It meant you could use a cheap DSP (from TI) in a system using a cheaper 8086 series processor for which you could get cheap tools and an embedded OS.

--
Pining for the fjords
386 DX? by Anonymous Coward · 2006-04-24 01:42 · Score: 1, Interesting

Does anyone remember the gool ol' days?

I still have my 386/40MHz + coprocessor.

And yes, AMD have called me to come in to their lab with my ancient relic about a year ago.
Weitek? by Anonymous Coward · 2006-04-24 01:46 · Score: 1, Interesting

Do you remember the Weitek math co-processor (i386-era stuff). It disappeared quite completely.

Also there is a big fear of specialized hardware accelerators because they could be rigged in silicon, which you will never find out. With the functionality implemented in software on generic purpose CPU you at least have a chance to audit the code to find out if the SSL handling has some NSA backdoor added or so. You buy a Chrysalis Luna VPN booster PCI card and assuredly know Mossad reads whatever you transaction. Never ever use hardware accelerators, they are salted to suit the secret services' taste. General purpose CPU and open source code is the only true safe way of wisdom.
Might we ever have socketed Hypertransport GPU's? by Dr.+Spork · 2006-04-24 03:06 · Score: 4, Interesting

The fact that this is practical has made me wonder how well it would work to use a motherboard socket for a GPU. With Hypertransport it would have absolutely direct access to system ram and could help itself to as much as it needed. I would love to be able to use standard CPU heatsinks on a GPU.
But what I find really exciting about this idea is that once the GPU is in the motherboard, I'm sure programmers would find an easy way to use all that logic to do calculations - say, media encoding. Heck, I know they are trying to do this with GPU's on cards, but this would be a much lower latency connection.
I wonder how this would affect total system cost. I mean, I know multi-socket mobos will always cost more, but then again, when the GPU is a chip instead of a card, that should bring costs down. Also, they could ditch all that PCI-e logic and those slots. Upgrading would definitely be cheaper, and can you imagine two socketed GPUs on the mobo running a Hypertransport version of SLi? That might be the fastest, quietest gaming rig ever!
Re:Speed by dubl-u · 2006-04-24 03:55 · Score: 3, Interesting

The first set of DRC modules will consume about 10 - 20 watts versus close to 80 watts for an Opteron chip.
People buying USD$5000 coprocessors, plus the cost of developing specialized code to use them, don't cut corners on the basis of their electric bill.

You're doing the math wrong. For decent colo space, I pay somewhere around $150 per rack-unit year and $120 per amp-year. If the coprocessor is really 10-20x faster for my workload, I don't just save the half-amp on one coprocessor; I get the savings on servers I don't need. Just in rack and power costs, one coprocessor would save at least $4k per year.

In other words, for its target audience, the $5k coprocessor would be more than paid for by the infrastructure savings alone. If you're the kind of company that is buying a few $5k coprocessors to replace $100k in servers, I hope you're thinking about your electricity bill, as it will be more than $25k over the lifespan of those machines.
On FPGAs as PC coprocessors -- latency rules by Jan · 2006-04-24 04:28 · Score: 3, Interesting

See earlier postings and blog entries on this concept:

http://www.fpgacpu.org/usenet/fpgas_as_pc_coproces sors.html
http://www.fpgacpu.org/log/aug01.html#010821-dimm

The latency to the FPGA fabric largely determines what kinds of coprocessing workloads are feasible.

When hypertransport came out, we (FCCM'ers) knew a HT-based lower latency interconnect should be possible. (Though I wouldn't call 75 ns +/- "low" latency -- that's a couple of hundred instruction issue slots, or a bit more than 1 full cache miss.) But DRC has gone and done it. I love the way it (apparently) just drops in and can even use that socket's DRAM DIMMs. Congrats to Steve Casselman and co.
Re:What about sdram slots by Brietech · 2006-04-24 04:33 · Score: 2, Interesting

There is actually a lot of active research going on in this field right now. It is called "Processor-in-memory" architecture, and it's best for handling things like array-based calculations, where you need to make a number of off-chip memory calls to complete. Staying completely on-chip makes it much faster, and it allows the embedded proc to take advantage of the internally wide (~256-bit) data path of modern memory. Look up Project DIVA and Project MONARCH, it is all DARPA-sponsored research, but the university I attend (USC) has a number of researchers involved with it.

--
I'm perfect in every way, except for my humility.
Optimize Audio by ElitistWhiner · 2006-04-24 06:09 · Score: 2, Interesting

New polyphonic software instruments rely on CPU cycles. More cycles sound not only better but much different. Musicians are at a tipping point at this moment in time. Old fashioned instruments which are standards on stage and tour are becoming brittle and expensive. Collectors are snapping up the old instruments at prices north of $5K USD reducing the availability of instruments for playing professionals. The Hammond B3's are going as high as $16K. Selmer Mk6 saxes $6K.

Software instruments are a necessity going forward. Its imperative to find a scalable system that is state-less and transparent to the performance.