Slashdot Mirror


Researchers Claim 1,000 Core Chip Created

eldavojohn writes "Remember a few months ago when the feasibility was discussed of a thousand core processor? By using FPGAs, Glasgow University researchers have claimed a proof of concept 1,000 core chip that they demonstrated running an MPEG algorithm at a speed of 5Gbps. From one of the researchers: 'This is very early proof-of-concept work where we're trying to demonstrate a convenient way to program FPGAs so that their potential to provide very fast processing power could be used much more widely in future computing and electronics. While many existing technologies currently make use of FPGAs, including plasma and LCD televisions and computer network routers, their use in standard desktop computers is limited. However, we are already seeing some microchips which combine traditional CPUs with FPGA chips being announced by developers, including Intel and ARM. I believe these kinds of processors will only become more common and help to speed up computers even further over the next few years.'"

18 of 118 comments (clear)

  1. Programmable CPU's by kge · · Score: 3, Interesting

    How long will it be before we will see the first motherboards with FPGA emerge?
    Then you can download the CPU type of your choice:

    -- naah, I don't like this new Intel core, I will try the latest AMD instead...

    1. Re:Programmable CPU's by Hal_Porter · · Score: 5, Informative

      A desktop CPU in an FPGA will always cost more and perform worse (i.e. slower clock rate) than a full custom chip from Intel or AMD. Mind you I've seen embedded designs where a microcontroller, Ram, Rom and custom logic are implemented in a $10 FPGA - especially where volumes are too low for an ASIC.

      On the other hand I could definitely see programmable logic inside Intel or AMD CPUs, a sort of super SSE. Then again even there you'd probably be better off using GPU like custom hardware for the heavy lifting. In fact I can see CPU/GPU hybrids being very common in low end machines. Full custom logic is always going to have a performance per $ advantage over FPGAs unless FPGA technology chains drastically.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    2. Re:Programmable CPU's by RKBA · · Score: 2

      FPGAs have been dynamically reprogrammable for years. You could load one with whatever special "hardware" custom instructions you wanted on the fly. Yes, custom logic is faster, but is inflexible.

    3. Re:Programmable CPU's by Lumpy · · Score: 2

      I'd like to see a FPGA 1x Pci express daughter-board and a open and well defined interface so that software can reconfigure and then use the FPGA's on the daughter-board for useful PC tasks....

      Game using it for high speed calculations, then DVD Fab uses it to crack BluRay encryption faster, Video encoding, Audio encoding, then the browser uses it for encryption, etc....

      A nice open standard without greed attached so everyone can use it in their software. Although in the world of many cores not being used, I guess that's the way it will go instead.

      --
      Do not look at laser with remaining good eye.
    4. Re:Programmable CPU's by durrr · · Score: 2

      A non-FPGA AMD/Intel CPU will always be faster doing general CPU business than a FPGA implemented one doing the same.
      It is however a stupid approach, a CPU is built to do general purpose calculations to allow for all software to exist without specialized hardware. A FPGA on the other hand is made to configure into specialized hardware in order to... well, i guess not having to build a lot of prototypes for hardware testing was its original purpose. But its use go far beyond that in that it could turn into that specialized hardware that would make your program run oh-so-bloody-fast.
      Ideally we would have a FPGA that instantly reconfigures on the fly and that have a compiler which turns any and all C-code into highly optimized FPGA code. Heck, even a small general purpose FPGA area implemented on motherboards to be used by games and software for a limited hardware optimization space could speed up and enable great things to be done as there are tasks that are MUCH faster when hardware implemented(i've heard the number 100-1000 times acceleration of dealing with gene sequencing data(although i guess the 1000x value is for the $2k FPGA chips)).

  2. Took long enough... by Crudely_Indecent · · Score: 2

    This story was already submitted two times before eldavojon managed to get it to the front page in a little over an hour...

    http://tech.slashdot.org/submission/1432844/University-of-Glasgow-pioneers-1000-core-processor
    http://tech.slashdot.org/submission/1432512/1000-core-processors-

    --


    "Lame" - Galaxar
    1. Re:Took long enough... by seifried · · Score: 2

      Those two submissions are poorly written and have no real detail compared to this one (which is no gem, but is better).

  3. Does anyone have a link... by John+Hasler · · Score: 2

    ...to a paper that assumes that the reader already knows what a cpu is? This article is content-free.

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  4. Life Cycle by glueball · · Score: 4, Interesting

    I think this is a great development. I've been using FPGAs in medical imaging for about 15 years. The groups that use the GPUs are getting great performance--definitely--but seeing as how MRI and CT machines are placed and need to run for 10, 15 20 years, I don't see how the GPUs will survive that time. One large OEM was pushing the GPUs for their architecture and I can't believe it will be successful if success is measured on the longevity scale. I'm sure the service sales guy will clean up.

    Why do GPUs fail? I'm not sure of the exact modes of failure but the amount of heat has got to have something to do with it. FPGAs will run much cooler and in the FLOPS/Watt game, will win.

    1. Re:Life Cycle by SuricouRaven · · Score: 3, Interesting

      I don't see why an MRI machine processor can't be made fault-tolerant. If a GPU burns out, it could just be disabled and a fault warning indicated - and then the machine can carry on working, even if it does take significently longer to produce an image. Then you call tech support, they come around and pull the faulty part and slot in a new one. The only concern then is making sure parts are available in twenty years - and I imagine any machine that expensive has to come with a long-term support contract which will oblige the manufacturer to ensure a supply of compatible boards in years to come.

  5. Re:1,000 cores by Muad'Dave · · Score: 2

    My bet is 1,000 very simple cores - most decent-sized FPGAs contain 10's or 100's of thousands of 'logic blocks'. The Spartan 6 series has between 3,840 and 147,443 logic blocks.

    --
    Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
  6. Disappointment by TheL0ser · · Score: 4, Funny

    The story's been up for 20 minutes and no one's tried to imagine a Beowulf cluster of them yet? This is a great sadness.

  7. Security issues by bluefoxlucid · · Score: 2

    A programmable hardware platform would provide amazing computing power because of hardware specialization: rather than emulating a proper CPU, you would download core architecture into the FPGA to accelerate tasks such as REGEX processing or H.264 decoding. You could compile the entire logic of a program into a gate array with various logical operators and flip-flop circuits for unlimited (albeit slow) registers (L2 registers) as well as including standard registers and SRAM cache (L1).

    Although the FPGA runs slower than a regular CPU, direct programming rather than instructional programming (that is logic blocks that perform programmatic functions, rather than logic blocks that interpret discrete instructions to follow programmatic functions) would shorten the overall hardware logic path. In short, the chip would follow fewer clock cycles and instead just "do things." The CPU would be slow, but optimized for your workload. The main performance bottleneck would be the context switch: replacing the logic gate configuration with a new program every time you switch. Other than that, dynamic program expansion could be utilized: inlining operations like multiplication, addition, etc, or breaking them out if space constraints make it hard to load the whole program onto the FPGA that way.

    The obvious, major issue we see is, of course, a security issue. You can now reprogram the CPU. This makes it difficult to prevent a program from bypassing any and all hardware security measures. This is solved by implementing a completely new security design on the chip, by which the CPU itself (the FPGA) is under control of external security mechanisms (paging etc handled in the MMU, outside the FPGA space, would largely mitigate most of this); it's not impossible to deal with, it's just an issue that needs to be raised.

    In short, this sucks for "download the new Intel CPU into your BIOS/bootloader." This sucks for whatever general purpose CPU you can think of. For an entirely new programmatic platform, however, this would provide some interesting performance possibilities, and some interesting challenges.

  8. Re:FPGA vs. GPU? by raftpeople · · Score: 2

    Not all problems map well to current GPU offerings. I have a problem that would benefit from parallel processing but due to a branchy algorithm and very random access for read/write, I can't really take advantage of GPU's to the extent some algorithms can (note: I have coded and run it on GPU's so this is more than just theory, additionally I have coded it to run on a network of computers and unfortunately the calc time vs network transmission time ratio for each cycle is not favorable enough for that to be a very good solution either, best solution is many cores accessing same memory).

    For this particular problem, a large number of minimally functional "cpus" or "cores" would be ideal, some basic math, logic and branching. An FPGA is one way to try to achieve something like this.

  9. Re:first by smallfries · · Score: 2

    Sigh. Multi-way branching was already old when ARM implemented it. What you fail to explain (understand?) is that there is a cost associated with either choice. As with most of engineering there is not a simple proposal that wins. In the case that branch prediction is perfect, the predicted execution is cheaper. In the case that the prediction is terrible the multi-way execution wins. In real life branch prediction is neither perfect, nor is it that terrible, so engineers have to balance the likelihood that one technique will be better for a given type of code against the probability that the processor will execute that type of code.

    Guess where multi-way branching wins? In small working-set loads typical of embedded processing applications. Guess where branch prediction wins? On a more general set of benchmarks typical of desktop computing.

    Your other point is equally incorrect - the decoding overhead for x86 is now minimal (a few percent of the size of the core). However the x86 instruction set is very good at packing lots of code into a small amount of space, which given the effectiveness of the x86 instruction cache is why it destroyed most of the pure-RISC competition. Those unused instructions are not "sitting in the silicon" as you put it rather idiotically. What they are doing is sitting in longer instruction words in the x86 encoding allowing the more frequently used instructions to be encoded in less space. It's a very simple form of compression, and as with other engineering tradeoffs it can win and lose in different circumstances but in the particular case of most x86 benchmarks it beats other instruction encodings.

    Seriously, x86 is SO FREAKIN' OLD, it has been finely tuned and matured with age.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  10. And, to think... by GodfatherofSoul · · Score: 2

    Ten years ago some young 6-digit ID Slashdotter was getting modded down for suggesting a Beowulf cluster of cores. Who's laughing now, mods?!?!?

    --
    I swear to God...I swear to God! That is NOT how you treat your human!
  11. What about an all core chip? by ka9dgx · · Score: 2

    The ultimate end to this trend is to build a system that is just core processing logic, with logic and memory all fused as closely as possible. I call it the BitGrid... it consists of 4bit look up tables hooked into an orthogonal grid. Because every single table can be used simultaneously, there is no Von Neuman bottleneck to worry about.

    Petaflops... here we come.... !