Start-up Could Kick Opteron into Overdrive

← Back to Stories (view on slashdot.org)

Start-up Could Kick Opteron into Overdrive

Posted by ryuzaki0 on Sunday April 23, 2006 @11:36PM from the true-niche-markets dept.

An anonymous reader writes "The Register is reporting that a new start-up, DRC Computer, has created a reprogrammable co-processor that can slot directly into Opteron sockets. This new product has the potential to boost the Opteron chips well ahead of their Xeon-based competition. From the article: 'Customers can then offload a wide variety of software jobs to the co-processor running in a standard server, instead of buying unique, more expensive types of accelerators from third parties as they have in the past.'"

13 of 127 comments (clear)

Why read a re-written press release by Threni · 2006-04-23 23:46 · Score: 4, Informative

when you can just read about it on the company's website?

http://www.drccomputer.com/pages/products.html
Re:So... by BenjyD · 2006-04-23 23:53 · Score: 4, Interesting

The article mentions applications in gas and oil companies. I would guess that means things like:

- MINLP/MILP (Wikipedia article is a bit weak) and Branch and Bound optimisation for things like pipeline routing, well selection etc.
- fluid mechanics for pipeline design
- geological data-mining for finding reservoirs
Those kind of jobs can have runtimes measured in days and weeks, so an accelerator could make a real difference.
A bit more accurate summary by subreality · 2006-04-23 23:54 · Score: 5, Informative

They basically made a FPGA (field programmable gate array) that can plug directly into HyperTransport (the Opteron CPU bus). FPGAs let you efficiently solve many problems that a general purpose processor can't. This has been done with PCI cards before, but the PCI is too slow for many uses. Giving it direct access to HT solves that problem.

That's a pretty cool niche.
price performance by sfraggle · 2006-04-23 23:56 · Score: 4, Funny

From the article:
"We have taken the approach that we must deliver three times the price-performance of a standard blade."
Isn't this BAD? Three times the Price/performance ratio would imply a higher price, or worse performance.

--
were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!
Re:Berkeley by Whiney+Mac+Fanboy · 2006-04-24 00:00 · Score: 4, Interesting

I'm sorry but 5k for a little chip that makes my opteron a little faster? I could just buy another opteron for that price: http://www.pricewatch.com/cpu/419325-1.htm> The price is supposed to drop to 3k next year.

You're quite right that these are not for you - their to run highly specialised calculations (the oil & gas industries are mentioned in TFA).

They make some operations much faster (think of a hardware mpeg decoder, useless for most things, but much more efficient for the single thing it can do then a general purpose CPU)

How does this affect cooling?

These things consume 10-20 watts compared to an Opeteron's 80, so it's affect on cooling is minimal (far less then adding the second opteron that you propose)

--
There are shills on slashdot. Apparently, I'm one of them.
Open protocols win! by maxwell+demon · 2006-04-24 00:19 · Score: 4, Insightful

I think the most important sentence in the article is this:
AMD's decision to open Hypertransport could end up being a key factor in Opteron's future success.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:neural networks or java? by Anonymous Coward · 2006-04-24 00:23 · Score: 4, Informative

Java co-processor: it has been tried before, with negative success. Main reason: it turns out that compiling byte-code to CISC CPU assembler and running the native code gives more speed than executing byte-code directly.
In late 90's, I've been burned off in precisely such start-up. We built an ASIC Java piggy-back byte-code CPU. It worked... as a proof of an idea. It didn't give much performance boost, at best, in 20-30% range. Noone wanted it.
Analog data analysis and general calculus, IMO by CFD339 · 2006-04-24 00:31 · Score: 4, Insightful

The sweet spot for plug in like this, IMO, would be similar to what you see a few board manufacturers doing now -- digital signal processing routines like Fourier transforms and other general calculus functions that are used in all kinds of data analysis where raw data comes in as analog variations, or where the moment by moment changes in state need to be modeled for engineering applications like fluid dynamics and harmonics.

I'd imagine you'll need to have the application compiled in such a way that it is aware of the additional processing capability, so its not likely to be a plug-n-pray solution to your general game player's graphical wet dreams.

--
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
About Time! by evilviper · 2006-04-24 00:39 · Score: 4, Interesting

I have to say, I'm surprised it has taken so long. Seems a few years past-due, IMHO.

One of first signs that PCs needed an FPGA or similar was hardware MPEG capture cards... They could do the job so much faster, and so much cheaper than your primary CPU, that the alternative is disappearing.

High-end graphics cards have been the most telling development. It's not that OpenGL is something magical, it's just that an ASIC can do many things so much better than a CPU that transfering much, much more raw data over the bus was still cheaper than actually processing it (despite the fact that interrupts are rather costly themselves).

PS2 clusters, Crypto cards, Hardware-accelerated NICs, SLI, all are a symptom of almost excatly the same problem...

The rising popularity of GPU programming made it extremely clear that there is a vaccuum here. Using the videocard isn't a very good method to accomplish this, just a stop-gap necessity. I thought from the beginning that FPGAs would become like the old math-coprocessors, and have their own motherboard socket, but neither AMD nor Intel were stepping up to fill this clear need. Installing it into a normal CPU socket, to get around this appathy, is a very clever hack I hadn't thought of.

I expect, with popularity, it will be cheaper to put a custom FPGA socket on motherboards, rather than building a full-fledged SMP motherboard for the purpose. After that, who knows... Maybe FPGAs will go the way of the math-coprocessors and get itegrated into future CPUs.

I know if I was running ATI or NVidia (or Hauppauge, or Level5), I'd be very worried about this thing eating the most profitable segment of my market.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
1. Re:About Time! by TheRaven64 · 2006-04-24 00:55 · Score: 4, Interesting
  I was at a talk by Bob Colwell a few weeks ago. One of the points he made was that within the next ten years we will be able to (economically) fit far more transistors on a chip than we realistically know what to do with. His example was using all of that space to have a vast array of P6 cores. If you did this, then:
  
  You would not be able to get enough power to the chip to make it work.
  
  You would not be able to dissipate the heat that it would draw.
  
  You would not be able to get enough data to it for more than about 10% (on a good day) of the chip real-estate to be actually doing anything.
  
  One possible solution is to have a hundred or so general purpose cores, and fill the rest up with simple algorithmic accelerators (e.g. FFT, crypto, [i]D{C,W}T, etc). These would spend most of their time turned off (not using power), but when a workload hit the chip that needed them they could be turned on to give a significant performance boost.
  --
  I am TheRaven on Soylent News
An old method, not really suitable nowadays by Flying+pig · 2006-04-24 00:41 · Score: 4, Interesting

Worked fine in the days of embedded systems when all memory was static (and usually only 16 bits wide), also when it was easy to wire an interrupt line so when the add-on had finished you could read the results. Nowadays much more difficult because of the need to integrate with DRAM controllers and timing, absence of convenient interrupts ( so need to poll a location to see when it completed). Whereas Hypertransport is designed to do the job and do it efficiently.
Another nice approach was the "swinging gate" RAM method in which you had two blocks of physical RAM in the same memory space. The main CPU filled one block with data, then flicked the switch so the co-processor could read that data while the CPU read the results from the other block, then put in new data for processing in the next cycle. Very easy to implement, much cheaper than FIFOs. It meant you could use a cheap DSP (from TI) in a system using a cheaper 8086 series processor for which you could get cheap tools and an embedded OS.

--
Pining for the fjords
FPGA vs. General Purpose CPU by DeadCatX2 · 2006-04-24 02:38 · Score: 4, Informative

Lots of other comments have made clear the point that it's not easy to program this kind of hardware. Typical software programs run in a very sequential manner. In fact, trying to get cooperative parallel execution of threads is known to be a major sticking point in the average programmer's education.

Hardware, on the other hand, is massively parallel. All the "gates" (*) are all running all the time. It's like multi-threading a program, taken to the limit of infinity. However, if designed correctly, this thing can scale beyond belief, since it's all parallel.

It's also important to note that it's a Virtex4 on that card. That's one hell of an FPGA, they sure aren't cutting any corners. I'm not sure which one they're using, but some Virtex4 chips have PowerPC processors at 450 MHz.

This is definitely a niche product for now, due mainly to the lack of people who can write code in Hardware Description Languages (HDLs). But if you can figure it out, and you have an application that works on a massive scale, this may be for you.

Oh, and for all you detractors who are saying "that thing only runs at 500 MHz! How is it supposed to be faster than my 2 GHz AMD chip?" You're forgetting one very important factor. Your AMD chip executes one instruction at a time, and the important instructions are surrounded by instructions whose sole purpose is to control program flow or move data back and forth. However, the XtremeDSP slices of a Virtex4 can each execute a multiply and an add in a single cycle, and there are up to 512 of them in the most hardcore Virtex4 chip, and other logic executing in parallel can control the "program flow" and ferry data back and forth across the bus.

*: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.

--
:(){ :|:& };:
Might we ever have socketed Hypertransport GPU's? by Dr.+Spork · 2006-04-24 03:06 · Score: 4, Interesting

The fact that this is practical has made me wonder how well it would work to use a motherboard socket for a GPU. With Hypertransport it would have absolutely direct access to system ram and could help itself to as much as it needed. I would love to be able to use standard CPU heatsinks on a GPU.
But what I find really exciting about this idea is that once the GPU is in the motherboard, I'm sure programmers would find an easy way to use all that logic to do calculations - say, media encoding. Heck, I know they are trying to do this with GPU's on cards, but this would be a much lower latency connection.
I wonder how this would affect total system cost. I mean, I know multi-socket mobos will always cost more, but then again, when the GPU is a chip instead of a card, that should bring costs down. Also, they could ditch all that PCI-e logic and those slots. Upgrading would definitely be cheaper, and can you imagine two socketed GPUs on the mobo running a Hypertransport version of SLi? That might be the fastest, quietest gaming rig ever!