Slashdot Mirror


Start-up Could Kick Opteron into Overdrive

An anonymous reader writes "The Register is reporting that a new start-up, DRC Computer, has created a reprogrammable co-processor that can slot directly into Opteron sockets. This new product has the potential to boost the Opteron chips well ahead of their Xeon-based competition. From the article: 'Customers can then offload a wide variety of software jobs to the co-processor running in a standard server, instead of buying unique, more expensive types of accelerators from third parties as they have in the past.'"

6 of 127 comments (clear)

  1. Why read a re-written press release by Threni · · Score: 4, Informative

    when you can just read about it on the company's website?

    http://www.drccomputer.com/pages/products.html

  2. A bit more accurate summary by subreality · · Score: 5, Informative

    They basically made a FPGA (field programmable gate array) that can plug directly into HyperTransport (the Opteron CPU bus). FPGAs let you efficiently solve many problems that a general purpose processor can't. This has been done with PCI cards before, but the PCI is too slow for many uses. Giving it direct access to HT solves that problem.

    That's a pretty cool niche.

  3. Re:neural networks or java? by Anonymous Coward · · Score: 4, Informative

    Java co-processor: it has been tried before, with negative success. Main reason: it turns out that compiling byte-code to CISC CPU assembler and running the native code gives more speed than executing byte-code directly.
    In late 90's, I've been burned off in precisely such start-up. We built an ASIC Java piggy-back byte-code CPU. It worked... as a proof of an idea. It didn't give much performance boost, at best, in 20-30% range. Noone wanted it.

  4. Re:Er.... question by kinnell · · Score: 3, Informative
    another Opteron would likely run at multiples of the clockspeed of that thing, and it would also be able to offload work from the *othewr* Opterons, such as disk I/O etc, giving your overall application more performance.

    Clockspeed is not a measurement of performance unless you are comparing similar architectures. With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.

    In practical terms, this product lends itself to compute intensive tasks such as signal processing, not data serving.

    --
    If I seem short sighted, it is because I stand on the shoulders of midgets
  5. Re:Er.... question by kinnell · · Score: 3, Informative

    The Virtex 4 FPGAs can be clocked at up to 500MHz, so we are talking about ~10-15 times slower than the processor, depending on the application. Even a simple digital filter would be faster when implemented in the FPGA, and this would only take a small fraction of the FPGA resources.

    --
    If I seem short sighted, it is because I stand on the shoulders of midgets
  6. FPGA vs. General Purpose CPU by DeadCatX2 · · Score: 4, Informative

    Lots of other comments have made clear the point that it's not easy to program this kind of hardware. Typical software programs run in a very sequential manner. In fact, trying to get cooperative parallel execution of threads is known to be a major sticking point in the average programmer's education.

    Hardware, on the other hand, is massively parallel. All the "gates" (*) are all running all the time. It's like multi-threading a program, taken to the limit of infinity. However, if designed correctly, this thing can scale beyond belief, since it's all parallel.

    It's also important to note that it's a Virtex4 on that card. That's one hell of an FPGA, they sure aren't cutting any corners. I'm not sure which one they're using, but some Virtex4 chips have PowerPC processors at 450 MHz.

    This is definitely a niche product for now, due mainly to the lack of people who can write code in Hardware Description Languages (HDLs). But if you can figure it out, and you have an application that works on a massive scale, this may be for you.

    Oh, and for all you detractors who are saying "that thing only runs at 500 MHz! How is it supposed to be faster than my 2 GHz AMD chip?" You're forgetting one very important factor. Your AMD chip executes one instruction at a time, and the important instructions are surrounded by instructions whose sole purpose is to control program flow or move data back and forth. However, the XtremeDSP slices of a Virtex4 can each execute a multiply and an add in a single cycle, and there are up to 512 of them in the most hardcore Virtex4 chip, and other logic executing in parallel can control the "program flow" and ferry data back and forth across the bus.

    *: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.

    --
    :(){ :|:& };: