Slashdot Mirror


Why 'Gaming' Chips Are Moving Into the Server Room

Esther Schindler writes "After several years of trying, graphics processing units (GPUs) are beginning to win over the major server vendors. Dell and IBM are the first tier-one server vendors to adopt GPUs as server processors for high-performance computing (HPC). Here's a high level view of the hardware change and what it might mean to your data center. (Hint: faster servers.) The article also addresses what it takes to write software for GPUs: 'Adopting GPU computing is not a drop-in task. You can't just add a few boards and let the processors do the rest, as when you add more CPUs. Some programming work has to be done, and it's not something that can be accomplished with a few libraries and lines of code.'"

44 of 137 comments (clear)

  1. A whole new level of parallelism by TwiztidK · · Score: 4, Insightful

    I've heard that many programmers have issues coding for 2 and 4 core processors. I'd like to see how they'll addapt to running "run hundreds of threads" in parallel.

    --
    Sent from my iPhone 5
    1. Re:A whole new level of parallelism by morcego · · Score: 3, Insightful

      This is just like programing for a computer cluster ... after a fashion.

      Anyone used to do both should have no problem with this.

      I'm anything but a high end programmer (I mostly only code for myself), and I have written plenty of code that runs with 7-10 threads. Believe me, when you change the way you think about how an algorithm works, it doesn't matter if you are using 3 or 10000 processors.

      --
      morcego
    2. Re:A whole new level of parallelism by Austerity+Empowers · · Score: 2, Insightful

      CUDA or OpenCL is how they do it.

    3. Re:A whole new level of parallelism by Sax+Maniac · · Score: 3, Insightful

      This isn't hundreds of threads that can run arbitrary code paths like a CPU, you have to totally redesign your code, or already have implemented parallel code so that you already run a number of threads that all do the same thing at the same time, just on different data.

      The threads all run in lockstep, as in, all the threads better be at the same PC at the same time. If you run into a branch in the code, then you lose your parallelism, as the divergent threads are frozen until they come back together.

      I'm not a big thread programmer, but I do work on threading tools. Most of the problems with threads seems to come with threads doing totally different code paths, and the unpredictable scheduling interactions that arise between them. GPU coding a lot more tightly controlled.

      --
      I can explanate how to administrate your network. You must configurate and segmentate it, so it can computate.
    4. Re:A whole new level of parallelism by Nadaka · · Score: 4, Insightful

      No it isn't. That you think so just shows how much you still have left to learn.

      I am not a high end programmer either. But I have two degrees on the subject and have been working professionally in the field for years, including optimization and parallelization.

      Many algorithms just won't have much improvement with multi-threading.

      Many will even perform more poorly due to data contention and the overhead of context switches and creating threads.

      Many algorithms just can not be converted to a format that will work within the restrictions of GPGPU computing at all.

      The stream architecture of modern GPU's work radically differently than a conventional CPU.

      It is not as simple as scaling conventional multi-threading up to thousands of threads.

      Certain things that you are used to doing on a normal processor have an insane cost in GPU hardware.

      For instance, the if statement. Until recently OpenCL and CUDA didn't allow branching. Now they do, but they incur such a huge penalty in cycles that it just isn't worth it.

    5. Re:A whole new level of parallelism by Dynetrekk · · Score: 5, Insightful

      Believe me, when you change the way you think about how an algorithm works, it doesn't matter if you are using 3 or 10000 processors.

      Have you ever read up on Amdahl's law?

    6. Re:A whole new level of parallelism by pushing-robot · · Score: 3, Funny

      Microsoft must be doing a bang-up job then, because when I'm in Windows it doesn't matter if I'm using 3 or 10000 processors.

      --
      How can I believe you when you tell me what I don't want to hear?
    7. Re:A whole new level of parallelism by jgagnon · · Score: 3, Interesting

      The problem with "programming for multiple cores/CPUs/threads" is that it is done in very different ways between languages, operating systems, and APIs. There is no such thing as a "standard for multi-thread programming". All the variants share some concepts in common but their implementations are mostly very different from each other. No amount of schooling can fully prepare you for this diversity.

      --
      Remember to maintain your supply of /facepalm oil to prevent chafing.
    8. Re:A whole new level of parallelism by Chris+Burke · · Score: 4, Informative

      Programmers of Server applications are already used to multithreading, and they've been able to make good use of systems with large numbers of processors on them even before the advent of virtualization.

      But don't pay too much attention to the word "Server". Yes the machines that they're talking about are in the segment of the market referred to as "servers", as distinct from "desktops" or "mobile". But the target of GPU-based computing isn't "Servers" in the sense of the tasks you normally think of -- web servers, database servers, etc.

      The real target is mentioned in the article, and it's HPC, aka scientific computing. Normal server apps are integer code, and depend more on high memory bandwidth and I/O, which GPGPU doesn't really address. HPC wants that stuff too, but they also want floating point performance. As much floating point math performance as you can possibly give them. And GPUs are way beyond what CPUs can provide in that regard. Plus a lot of HPC applications are easier to parallelize than even the traditional server codes, though not all fall in the "embarrassingly parallel" category.

      There will be a few growing pains, but once APIs get straightened out and programmers get used to it (which shouldn't take too long for the ones writing HPC code), this is going to be a huge win for scientific computing.

      --

      The enemies of Democracy are
    9. Re:A whole new level of parallelism by Hodapp · · Score: 3, Informative

      I am one such programmer. Yet I also coded for an Nvidia Tesla C1060 board and found it much more straightforward to handle several thousand threads at once.

      Not all types of threads are created equal. I usually explain CUDA to people as the "Zerg Rush" model of computing - instead of a couple, well-behaved, intelligent threads that try to be polite to each other and clean up their own messes, you throw a horde of a thousand little vicious, stupid threads at the problem all at once, and rely on some overlord to keep them in line.

      Most of the guides explained it as, "Flops are free, bandwidth is expensive." This board had a 384 or 512-bit wide memory bus with a very high latency, and the reason you throw that many threads at it is to let the hardware cover up the latency - it can merge a huge number of memory reads/writes into one operation, and as soon as a thread is waiting on memory I/O it can swap another thread into that same SP and let it compute. If memory serves me, the board was divided into blocks of 8 scalar processors (each block had some scratchpad memory that could be accessed almost as fast as a register) and you wrote groups of 16 threads which ran in lock-step on that processor (no recursion was allowed, and if one branched, the others would just wait around until it reached the same point) in two rounds.

      Sure, that's a bit complex to optimize for, but it beats the hell out of conventional threading while trying to optimize for x86 SIMD. And if you manage to write it so it runs well on CUDA, it generally will scale effortlessly to whatever card you throw it at.

      It's looking like OpenCL won't be much different, but I have yet to try it. I'm kind of eager, since apparently AMD/ATI's current cards, for the money, have a bit more raw power than Nvidia's.

    10. Re:A whole new level of parallelism by Lord+of+Hyphens · · Score: 2, Interesting

      Have you ever read up on Amdahl's law?

      I'll see your Amdahl's Law, and raise you Gustafson's Law.

      --
      "I've spent my whole life figuring out crazy ways to do things. It'll work." -- Montgomery Scott, "Relics"
    11. Re:A whole new level of parallelism by Fulcrum+of+Evil · · Score: 2, Insightful

      most post secondaries are now teaching students how to properly thread for parallel programming.

      No they aren't. Even grad courses are no substitute for doing it. Never mind that parallel processing is a different animal than SIMD-like models that most GPUs use.

      I haven't had to deal with any of it myself, but I imagine it'll boil down to knowing what calculations in your program can be done simultaneously, and then setting up a way to dump it off onto the next available core.

      No, it's not like that. you set up a warp of threads running the same code on different data and structure it for minimal branching. That's the thumbnail sketch - nvidia has some good tutorials on the subject and you can use your current GPU.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    12. Re:A whole new level of parallelism by Anonymous Coward · · Score: 2, Interesting

      You might find this Google Tech Talk interesting..

    13. Re:A whole new level of parallelism by sarkeizen · · Score: 2, Informative

      Personally (and I love that someone below mentioned Ahmdals law). The problem isn't as you said about specific language constructs but that there isn't any general solution to parallelism. That is to use Brook's illustration, problems we try to solve with computers aren't like harvesting wheat - they aren't efficiently divisible to an arbitrary degree. We do know of a few problems like this which we call "embarassingly parallel" but these are few and far between. So GPU's are great MD5 crackers, protein folders and I personally *love* writing CUDA code but I don't suffer from the delusion that this is somehow a revolution in software. That the usual day-to-day tasks are going to be affected. So the idea that GPUs are moving into the server room seems optimistic because the majority of stuff in there is pretty mundane.

      That said I'd say I wonder if there aren't some architectural limitations on GPUs e.g. memory protection and if we really wanted to use these for general purpose computing and added them would we lose performance? In other words are we just making some kind of cores-to-features tradeoff?

    14. Re:A whole new level of parallelism by psilambda · · Score: 3, Interesting

      The article and everybody else are ignoring one large, valid use of GPUs in the data center--whether you call it business intelligence or OLAP--it needs to be in the data center and it needs some serious number crunching. There is not as much difference between this and scientific number crunching as most people might think. I have been involved in both crunching numbers for financials at a major multinational and had the privilege of being the first to process the first full genome (complete genetic sequence--terabytes of data) for a single individual and actually the genomic analysis was much more integer based than the financials. Based on my experience with both, I created the Kappa library for doing CUDA or OpenMP analysis in a datacenter--whether for business or scientific work.

    15. Re:A whole new level of parallelism by David+Greene · · Score: 4, Interesting

      The stream architecture of modern GPU's work radically differently than a conventional CPU.

      True if the comparison is to a commodity scalar CPU.

      It is not as simple as scaling conventional multi-threading up to thousands of threads.

      True. Many algorithms will not map well to the architecture. However, many others will map extremely well. Many scientific codes have been tuned over the decades to exploit high degrees of parallelism. Often the small data sets are the primary bottleneck. Strong scaling is hard, weak scaling is relatively easy.

      Certain things that you are used to doing on a normal processor have an insane cost in GPU hardware.

      In a sense. These are not scalar CPUs and traditional scalar optimization, while important, won't utilize the machine well. I can't think of any particular operation that's greatly slower then on a conventional CPU, provided one uses the programming model correctly (and some codes don't map well to that model).

      For instance, the if statement.

      No. Branching works perfectly fine if you program the GPU as a vector machine. The reason branches within a warp (using NVIDIA terminology) are expensive is simply because a warp is really a vector. The GPU vendors just don't want to tell you that because either they fear being tied to some perceived historical baggage with that term or they want to convince you they're doing something really new. GPUs are interesting, but they're really just threaded vector processors. Don't misunderstand me, though, it's a quite interesting architecture to work with!

      --

    16. Re:A whole new level of parallelism by Anonymous Coward · · Score: 2, Interesting

      I've heard that many programmers have issues coding for 2 and 4 core processors. I'd like to see how they'll addapt to running "run hundreds of threads" in parallel.

      If that's the paradigm they're operating in, it will probably fail spectacularly. Let me explain why.

      In the end, GPU's are essentially vector processors (yes, I know that's not exactly how they work internally, but bear with me). You feed them one or more input vectors of data and one or two storage vectors for output and they do the same calculation on every element of the input and store the results in the output. Think about what you need for pixel rendering: it's things like "apply a fixed Affine transform to every pixel of the input image and store the results as the output image" or "add [alpha blend] these two images together and store the result." These are the kind of tasks vector processors like the old Cray's were designed to implement efficiently; compilers implementing OpenMP are also working within this kind of paradigm.

      Threads, in contrast to vector processing, are independent streams of execution. While you can use threads to split a loop into pieces, the normal thread pattern is something more like "wait for an event, and then respond to it appropriately." The real problem here is that because threads are independent tasks, memory sharing is hard (semaphores, spin locks, and all that) because you can't guarantee the behavior of any other thread.

      Clusters, finally, as a few people have mentioned (although perhaps never used), are different yet again. While each node in a cluster runs as an independent machine and thus conceptually resembles a thread, the nodes don't have a pool of shared memory (they may not even have shared disk space!). If I want to get data from node A to node B, I have to copy it over the network. Because the internal bandwidth of a cluster is so much lower than the memory bus of a shared-memory computer, you spend most of your time figuring out how to minimize the amount of data you have to copy between nodes and worrying about things like cluster topology. As a result, algorithms that scale well on a shared-memory machine may or may not scale well at all on a distributed cluster.

      So why bother? Because each design has its own strengths and weaknesses. Vector processors are great if your doing a vector operation, but things like stream processing (e.g., compressing video data) don't vectorize particularly well. Threads are generic and flexible; so flexible that you can't really optimize the hardware for them. They also require discipline to avoid dead-locks and other related problems. Clusters, finally, are inexpensive and are ideally suited for "batch" tasks like web servers or databases where each thread really is an independent job, but for things like weather simulations (where lots of data has to be exchanged between nodes) they require very careful attention to the algorithms used or the performance can tank as the size of the system gets large.

  2. Good luck with that by tedgyz · · Score: 3, Insightful

    This is a long-standing issue. If your programs don't just "magically" run faster, then count out 90% or more of the programs that will benefit from this.

    --
    "No matter where you go, there you are." -- Buckaroo Banzai
  3. Yes, of course by Anonymous Coward · · Score: 2, Funny

    The sysdamins need new machines with powerful GPUs, you know, for business purposes.

    Oh and, they sell ERP software on Steam now, too, so we'll have to install that as well.

    1. Re:Yes, of course by Yvan256 · · Score: 5, Funny

      Portal 2? It's something for our Web server. It adds more portals to access the internet.

  4. CUDA by Lord+Ender · · Score: 3, Informative

    I was interested in CUDA until I learned that even the simplest of "hello world" apps is still quite complex and quite low-level.

    NVidia needs to make the APIs and tools for CUDA programming simpler and more accessible, with solid support for higher-level languages. Once that happens, we could see adoption skyrocket.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    1. Re:CUDA by Rockoon · · Score: 4, Interesting

      Indeed. With Cuda, DirectCompute, and OpenCL, nearly 100% of your code is boilerplate interfacing to the API.

      There needs to be a language where this stuff is a first-class citizen and not just something provided by an API.

      --
      "His name was James Damore."
    2. Re:CUDA by cgenman · · Score: 2, Interesting

      While I don't disagree that NVIDIA needs to make this simpler, is that really a sizeable market for them? Presuming every college will want a cluster of 100 GPU's, they've still got about 10,000 students per college buying these things to game with.

      I wonder what the size of the server room market for something that can't handle IF statements really would be.

    3. Re:CUDA by psilambda · · Score: 2, Informative

      Indeed. With Cuda, DirectCompute, and OpenCL, nearly 100% of your code is boilerplate interfacing to the API. There needs to be a language where this stuff is a first-class citizen and not just something provided by an API.

      If you use CUDA, OpenCL or DirectComputeX it is--try the Kappa library--it has its own scheduling language that make this much easier. The next version that is about to come out goes much further yet.

    4. Re:CUDA by BitZtream · · Score: 2, Informative

      GCD combined with OpenCL makes it usable on a GPU, but that would be stupid. GPUs aren't really 'threaded' in any context that someone who hasn't worked with them would think of.

      All the threads run simultaneously, and side by side. They all start at the same time and they all end at the same time in a batch (not entirely true, but it is if you want to actually get any boost out of it).

      GCD is multithreading on a General Processing Unit, like your Intel CoreWhateverThisWeek processor. Code paths are ran and scheduled on different cores as needed and don't really run side by side, but they can run at the same time which is practical and useful in A LOT of cases.

      OpenCL is multithreading on a graphics chip. It lets you do the same calculation over and over again or on a very large data set, side by side. You can calculate 128 encryption keys in one pass, but you can't calculate one encryption key, the average of your monthly bills, and draw a circle because the graphics chip doesn't do random processing side by side, it runs a whole bunch of the same instructions side by side and goes to hell in a handbasket the INSTANT you break its ability to run all the 'threads' side by side, executing the same instruction in each at the same time.

      I really don't think you understand either standard GP multithreading or what GPUs are practically capable of doing.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  5. OpenCL by gbrandt · · Score: 2, Informative

    Sounds like a perfect job for OpenCL. When a program is rewritten for OpenCL, you can just drop in CPU's or GPU's and they get used.

    1. Re:OpenCL by Anonymous Coward · · Score: 3, Informative

      Unfortunately, no. OpenCL does not map equally to different compute devices, and does not enforce uniformity of parallelism approaches. Code written in OpenCL for CPUs is not going to be fast on GPUs. Hell, OpenCL code written for ATI GPUs is not going to work well on nVidia GPUs.

  6. Of course not! by Yvan256 · · Score: 2, Informative

    It's not something that can be accomplished with a few libraries and lines of code.

    It doesn't take a few libraries and lines of code... It takes a SHITLOAD of libraries and lines of code! - Lone Starr

  7. Libraries by Dynetrekk · · Score: 2, Insightful

    I'm really interested in using GPGPU for my physics calculations. But you know - I don't want to learn Nvidia's low-level, proprietary (whateveritis) in order to do an addition or multiplication, which may or may not outperform the CPU version. What would be _really_ great is stuff like porting the standard "low-level numerics" libraries to the GPU: BLAS, LAPACK, FFTs, special functions, and whatnot - the building blocks for most numerical programs. LAPACK+BLAS you already get in multicore versions, and there's no extra work on my part to use all cores on my PC. Please, computer geeks (i.e. more computer geek than myself), let me have the same on the GPU. When that happens, we can all buy Nvidia HotShit gaming cards and get research done. Until then, GPGPU is for the superdupergeeks.

    1. Re:Libraries by brian_tanner · · Score: 3, Informative

      It's not free, unfortunately. I briefly looked into using it but got distracted by something shiny (maybe trying to finish my thesis...)

      CULA is a GPU-accelerated linear algebra library that utilizes the NVIDIA CUDA parallel computing architecture to dramatically improve the computation speed of sophisticated mathematics.
      http://www.culatools.com/

    2. Re:Libraries by Anonymous Coward · · Score: 2, Informative

      It's not as complete as CULA, but for free there is also MAGMA. Also, nVidia implements a CUDA-accelerated BLAS (CUBLAS) which is free.

      As far as OpenCL goes, I don't think there has been much in terms of a good BLAS made yet. The compilers are still sketchy (especially for ATI GPUs), and the performance is lacking on nVidia GPUs compared to CUDA.

    3. Re:Libraries by guruevi · · Score: 2, Informative

      The CUDA dev kit includes libraries and examples for BLAS (CUBLAS) and FFT, several LAPACK routines have been implemented in several commercial packages (Jacket, CULA) and free software (MAGMA).

      The OpenCL implementation in Mac OS X has FFT and there are libraries for BLAS (from sourceforge) and MAGMA gives you some type of LAPACK implementation.

      I work with HPC systems based on nVIDIA GPU's in a research environment - it's still a lot of work (as all research/cluster programs are) but it's certainly doable and can most certainly accelerate some calculations but it depends highly on the application and even more so on the coder.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
  8. Re:Not really news... by Dynetrekk · · Score: 2, Interesting

    I'm no expert, but from what I understand, it wouldn't be at all surprising. IBM has been regularly using their Power processors for supercomputers, and the architecture is (largely) the same. The Cell has some extra graphics-friendly floating-point units, but it's not entirely differnent from the CPUs IBM has been pushing for computation in the past. I'm not even sure if the extra stuff in the Cell is interesting in the supercomputing arena.

  9. IIS 3D by curado · · Score: 2, Interesting

    So.. webpages will soon be available in 3D with anti-aliasing and realistic shading?

  10. Wouldn't a DSP do better? by 91degrees · · Score: 2, Interesting

    So why a GPU rather than a dedicated DSP? Seems they do pretty much the same thing except a GPU is optimised for graphics. A DSP offers 32 or even 64 bit integers, have had 64 bit floats for a while now, allow more flexible memory write positions, and can use the previous results of adjacent values in calculations.

    1. Re:Wouldn't a DSP do better? by pwnies · · Score: 2, Informative

      Price. GPUs are being mass produced. Why create a separate market that only has the DSP in it (even if the technology is already present and utilized by GPUs) for the relatively small amount of servers that will be using them?

  11. Crysis 2... by drc003 · · Score: 2, Funny

    ...coming soon to a server farm near you!

    1. Re:Crysis 2... by JorgeM · · Score: 2, Interesting

      I'd love this, actually. My geek fantasy is to be able to run my gaming rig in a VM on a server with a high end GPU which is located in the basement. On my desk in the living room would be a silent, tiny thin client. Additionally, I would have a laptop thin client that I could take out onto the patio.

      On a larger scale, think Steam but with the game running on a server in a datacenter somewhere which would eliminate the need for hardware on the user end.

  12. RemoteFX by JorgeM · · Score: 2, Interesting

    No mention of Microsoft's RemoteFX coming in Windows 2008 R2 SP1? RemoteFX uses the server GPU for compression and to provide 3d capabilites to the desktop VMs.

    Any company large enough for a datacenter is looking at VDI and RemoteFX is going to be supported by all of VDI providers except VMware. VDI, not relatively niche case massive calculations, will put GPUs in the datacenter.

  13. Re:Notice in TFA by binarylarry · · Score: 2, Interesting

    Not only that, but they posit that Microsoft's solution solves the issue of both Nvidia's proprietary-ness and the OpenCL boards's "lack of action."

    Fuck this article, I wish I could unclick on it.

    --
    Mod me down, my New Earth Global Warmingist friends!
  14. Modern GPUs, for all their hype, are just DSPs by pslam · · Score: 3, Interesting

    I could almost EOM that. They're massively parallel, deeply pipelined DSPs. This is why people have trouble with their programming model.

    The only difference here is the arrays we're dealing with are 2D and the number of threads is huge (100s-1000s). But each pipe is just a DSP.

    OpenCL and the like are basically revealing these chips for what they really are, and the more general purpose they try to make them, the more they resemble a conventional, if massively parallel, array of DSPs.

    There's a lot of comments on this subject along the lines of "Why couldn't they make it easier to program?" Well, it always boils down to fundamental complexities in design, and those boil down to the laws of physics. The only way you can get things running this parallel and this fast is to mess with the programming model. People need to learn to deal with it, because all programming is going to end up heading this way.

    1. Re:Modern GPUs, for all their hype, are just DSPs by pclminion · · Score: 2, Interesting

      There's a lot of comments on this subject along the lines of "Why couldn't they make it easier to program?"

      Why should they? Just because not every programmer on the planet can do it doesn't mean there's nobody who can do it. There are plenty of people who can. Find one of these people and hire them. Problem solved.

      Most programmers can't even write single-threaded assembly code any more. If you need some assembly code written, you hire somebody who knows how to do it. I don't see how this is any different.

      As far as whether all programming will head this direction eventually, I don't think so. Most computational tasks are data-bound, and throughput is enhanced by improving the data backends, which are usually handled by third parties. We already don't know how the hell our own systems work. For the people who really need this kind of thing, you need to go out and learn it or find somebody who knows it. Expecting that the whole world can do it is crazy thinking.

  15. GPU apps are pretty specific... by bored · · Score: 2, Insightful

    I've done a little CUDA programming, and I've yet to find significant speedups doing it. Every single time, some limitation in the arch keeps it from running well. My last little project, ran about 30x faster on the GPU than the CPU, the only problem was that the overhead of getting it to the GPU + computation + overhead of getting it back, was roughly equal to the time it took to just dedicate a CPU.

    I was really excited about AES on the GPU too, until it turned out to be about 5% faster than my CPU.

    Now if the GPU was designed more as a proper coprocessor (ala early x87, or early Weitek) and integrated into the memory hierarchy better (put the funky texture ram and such off to the side) some of my problems might go away.

  16. Re:How much number-crunching is your server doing? by smallfries · · Score: 2, Informative

    No, it's the difference between "efficiency" and what is claimed as "efficient" to get a paper published. That's a really bad citation for AES on GPUs as there is a line of prior work going back to Cook and Cryptographics. In fact that paper is a classic example of getting something into the literature that has already been done. The authors have submitted it to an unrelated conference and failed to cite the relevant work.

    If we look at their best figures then throw away the 15x claimed speedup as it doesn't consider memory transfer costs. The 5x speedup is more realistic. The GPU that they use (8800gtx) has 128 stream processors running at 1.35Ghz. The comparison is a PIV running at 3Ghz. Roughly speaking we can compare the cycles taken on each platform as a measure of the work done. The graphics card stream processors perform 57x more clock cycles.

    The central workload in AES for high-performance is completely memory bound. The cycles are just used to stage results from memory and perform XOR instructions. So the stream processors only execute the code 5x quicker with 57x more clocks and a huge memory bandwidth advantage that I can't be bothered to look up.

    So no, 10x less output per clock is not "efficient" in my book. But if you publish your paper in a crappy unrelated conference then you will get away with it.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php