Slashdot Mirror


Low-cost Reconfigurable Computing (FPGA's)

Anonymous Coward writes: "People at the at Chinese University of Hong Kong have developed a reconfigurable computing card which uses the SDRAM memory slot instead of the PCI bus. Measurements in the paper show greatly improved bandwidth and latency - why aren't more people using this idea?"

8 of 165 comments (clear)

  1. RAM-slot FPGAs by Frothy+Walrus · · Score: 4, Insightful

    the idea of FPGA computing has been around for a little while at least (look here for examples). i think Scientific American even wrote about "configurable computers" in 1997 or so. why aren't they more popular, then?

    modern processors are well-adapted to general computing tasks.

    FPGAs (read: custom iron) might be good for a few specialized tasks (breaking 3DES, for instance), but most of us will be a lot happier on our UltraSparcs and Athlons and G4s.

    1. Re:RAM-slot FPGAs by Nindalf · · Score: 3, Insightful

      modern processors are well-adapted to general computing tasks.

      This is a completely meaningless statement, because there's no such thing as a general computing task. Today's popular uses for computers developed as a result of the hardware's capabilities (which influenced the hardware's design, in an evolutionary feedback loop). We are only beginning to explore the uses of digital microcircuitry.

      Modern processors and modern programming methods are well adapted to each other, so one should expect that unorthodox hardware would be difficult to program and give poor results. We just don't have the experience for it.

      However, it becomes increasingly harder to get a consistent return on larger and larger surface-element counts with serial execution programming. Random memory accesses and conditional branching are discouraged in favor of "predictable" memory accesses and instruction execution, and greater and greater sacrifices of the illusion of serial execution are made in favor of efficiency. The advantages of parallelism grow as the chips grow, and reconfigurability at the level of the gate logic is the natural extreme we will likely tend toward as we figure out how to handle trillions of transistors in one device.

      Can you really imagine current design trends extrapolated to instruction pipelines millions deep? Serial execution does not scale infinitely.

    2. Re:RAM-slot FPGAs by jason_watkins · · Score: 2, Insightful

      The problem with using FPGA's for ray-tracing is that, when you fire a ray into the scene, you don't know what it's going to hit. If it hits something, then typically you examine illumination to that spot from all the light sources, and spawn another ray if you're doing reflection, transmission, or diffuse interreflection.

      The problem is, you can't parellel this easily:

      for each pixel in image
      for each primative in scene
      if ray through pixel hits primative
      for each light
      if ray from light to hit doesn't hit something else first, calculate illumination
      calculate reflected color based on material
      write color to image

      so let's say we equip each node of an FPGA with a program that evaluates this program. If we have 1,000 of these nodes, that means we can like, render 1000 pixels at the same time right?

      wrong. The scene is going to be far to large to store in each FPGA. So, each FPGA node is going to have to wander down the list of primatives in ram to do it's intersection tests. That is not fast.

      Now sure, you can set things up so that all the nodes are listening to one broadcast bus, and all the primatives in the stream are listed off, and any nodes find a hit, they remember it. After that you let list the light sources, letting nodes calculate illumination at the hit, then let them process the material. Most likely they have to do some texture lookups here.

      So sure, that's a way of reshuffling the loop order and doing a lot of tests in parellel, but the real truth is, if you use some sort of spatial hierarchy on a general purpose cpu, it will be much faster.

      Traditional beowulf clusters are typically much better for this sort of thing, because they usually can store an signifigant portion of the scene discription locally, so there's no communication overhead that limits the parellelism.

      The deal with Final Fantasy is they didn't take into account subsurface scattering. Only recently have good models for that surfaced, and the computation time is prehibitive.

  2. Yeah but.... by Britano · · Score: 1, Insightful

    If this were so cool and such an awesome way to do computing, then why do we even have the PCI standard? They should make motherboards with 6 SDRAM slots instead of 6 PCI slots. They would help out SETI@Home!

    --
    Avoid The Rush, Hate OU Early!!!
  3. Speed and gates... by tcc · · Score: 4, Insightful

    FPGA technology to replace (or more like having a "flashable") Current processors could/would be a great leap in computing, it would mean having a "soft-hardware upgrade", microcode or "sillicon" bugs could be addressed, but there would probably be the downside of everything else in the computing industry: companies would released bugged stuff, beta would go around like current drivers :), etc etc.

    All this said, unless some big breakthrough happens, we won't see out Athlon or Pentium IV system replaced by these, the 2 main limitation of FPGA are the number of available gates, and the speed at which they operate.

    While they've managed to increase the number of gates to something quite big (last time I read about this I think it was in the low million? 1 or 2, but I can't be sure), this is enough to "emulate" microcontrollers or lower end processors, but not enough for higher end microprocessors. While eventually they will catch up and maybe someone will do his thesis on emulating an Athlon off FPGA stuff, by that time we'll be at the 2nd or 3rd rev of Post-hammer processors, so it will look like today being able to emulate a 486 (granted, there could be some use in that, but none come to mind right now.. parrallel processing? 1 athlon can replace zillion of 486s...) Also the developpement of microprocessor is going at a faster pace than FPGA technology. I am not saying this couldn't happen, but it would need a serious bump in the fab process and technology to be able to reach Ghz speed, and probably few 100M's of gates.

    Still, it's a very interresting technology.

    --
    --- Metamoderating abusive downgraders since my 300th post.
  4. Using memory slots for devices is a bad idea by Skapare · · Score: 5, Insightful

    Using memory slots for devices is a bad idea. The interface is not designed for devices. There are no IRQ lines. The address space can be configured by the chipset to fall anywhere in the address space of the whole machine (your device may end up starting at 0). The address space may even be interleaved with other memory devices in other slots. And the next generation of memory will use a whole different interface, and most new motherboards will soon migrate to it with little concern for backward compatibility.

    --
    now we need to go OSS in diesel cars
  5. Re:Why aren't more people... by Anonymous Coward · · Score: 1, Insightful

    The memory bus is very poorly suited for stringing anything together, there are very strict assumptions on tracelengths from the connector to the memory chip and more such restrictions to be able to get the high bandwith. If you have something you can connect to it locally like these FPGA's its just about possible, but trying it with the highest speed memory standards would be a formidable task.

    Something like Hypertransport is a lot more suited for high bandwith clustering, unfortunately AMD has not designed a port for it ... its only for backplane use. Parallel unidirectional LVDS connections with forwarded clocks are the most balanced solution to high bandwith interconnects, and its easy to use over cable's (if you can solve the latency mismatch problems, which is possible with tapped delay lines). Intels serial stuff is just plain icky, high latency and expensive silicon.

    But the forces that be have always resisted a cheap high bandwith non local interconnect, SCI has been kept down by the man ... and although Hypertransport is alike in many way's for some reason there isnt a specification for cable connections in the works.

    The industry does not want us to have cheap clusters with the same interconnect bandwith as the ultra expensive heavy iron, there is too much money at stake ...

  6. um, because 6 PCI slots, 2-3 SDRAM..duh.. by Anonymous Coward · · Score: 1, Insightful

    Most mobos only come with 2-3 memory slots. 4 if you're lucky, more if you're paying through the nose for a server mobo.