Slashdot Mirror


Using GPUs For General-Purpose Computing

Paul Tinsley writes "After seeing the press releases from both Nvidia and ATI announcing their next generation video card offerings, it got me to thinking about what else could be done with that raw processing power. These new cards weigh in with transistor counts of 220 and 160 million (respectively) with the P4 EE core at a count of 29 million. What could my video card be doing for me while I am not playing the latest 3d games? A quick search brought me to some preliminary work done at the University of Washington with a GeForce4 TI 4600 pitted against a 1.5GHz P4. My Favorite excerpt from the paper: 'For a 1500x1500 matrix, the GPU outperforms the CPU by a factor of 3.2.' A PDF of the paper is available here."

396 comments

  1. The day is saved by drsmack1 · · Score: 5, Funny

    Now I finally have a use for the 20 Voodoo 2 cards I have in a box in the basement. Now I can have my very own supercomputer. I just need some six pci slot motherboards.... Instant cluster!

    1. Re:The day is saved by Anonymous Coward · · Score: 0, Offtopic

      Unless those Voodoo 2s have magically grown T&L units, they're not going to do you much good.

    2. Re:The day is saved by Anonymous Coward · · Score: 0

      Please leave your name and phone number here and I will send them to you immediately.

    3. Re:The day is saved by Anonymous Coward · · Score: 0, Offtopic

      On second thoughts I don't want them. I'd like one of your telephone-teleporters instead.

    4. Re:The day is saved by Frnknstn · · Score: 0, Offtopic

      Surely the real untapped goldmines are the shaders?

      --
      If it's in you sig, it's in your post.
    5. Re:The day is saved by PygmySurfer · · Score: 5, Funny

      Unless those Voodoo 2s have magically grown T&L units, they're not going to do you much good.

      Maybe they have. They've been trapped in that box together in the basement for a long time.

    6. Re:The day is saved by notsoclever · · Score: 1

      Voodoo 2s didn't have shader logic. They were very much fixed-function, and very low-precision at that.

      --
      There are 10 kinds of people: ones who understand ternary, ones who don't, and ones who think this joke is about binary
    7. Re:The day is saved by Directrix1 · · Score: 4, Interesting

      Doesn't anybody find it annoying that 3-D operation is being hardwired into the video card to begin with? Why aren't we making 200million transistor math coprocessors with high bus speeds, uncoupled from the video card. This way we wouldn't have to keep getting a new video card every time we want to upgrade our systems 3-d performance. Since these operations are highly symmetric, you could put in an array of these into one machine to incrementally upgrade. Also, this would make the issue of how to access your GPU to use for other purposes irrelevant, as it would be a math coprocessor expected to be used as such anyways. And the best reason for doing it this way: OpenGL (and DirectX too) could become more of a thick software layer on top of the generic coprocessor, and since the coprocessors would eventually standardize on common instruction set, you wouldn't need a new version of OpenGL or DirectX for every new coprocessor release. What do you guys think?

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    8. Re:The day is saved by Metasquares · · Score: 4, Insightful
      This way we wouldn't have to keep getting a new video card every time we want to upgrade our systems 3-d performance.
      I think you've just answered your own question.
    9. Re:The day is saved by Anonymous Coward · · Score: 0

      The advantage video cards have is that they are easy to upgrade, any company can make a AGP card. However there is no standard slot for a coprocessor on a mother board. Intel and AMD aren't exactly eager to create one either, since it would cut into their chip sales. They like things the way they are, since right now changing you CPU to a different manufacturer requires a new motherboard (vendor lock in).

      They have no desire to create a situation where most of your game performance comes from a seperate chip (likely made by nVidia or IBM).

      You'd have to get this thing to work in a PCI slot, and then there are a lot of driver issues that need to be worked out before it can reach home users.

      If you got Mathematica, Renderman, and some engineering apps to demand something like this it might work.

    10. Re:The day is saved by Anonymous Coward · · Score: 0

      Are you suggesting a beowolf cluster of GPU's?

    11. Re:The day is saved by mabinogi · · Score: 1

      I think you'll find that it is far cheaper to obtain the level of 3D performance you get from current video cards using dedicated processors designed for that purpose alone, than by the use of any number of general purpose vector coprocessors....

      And you'd still have the upgrade issue, only you'd be upgrading the co-processor instead -or buying a new one to put on the board - but realistically there's not a lot of room on a motherboard for many extra large processors especially when you start taking cooling into consideration.

      You'd also have memory issues to deal with - currently video card makers can far more easily experiment with new memory technologies than motherboard manufacturers....and the co-processor idea would probably mean a shared memory model, so you'd be tied to whatever the current consumer standards were.

      I'm not saying it wouldn't work...I don't know enough to know that, but those are some of the potential problems I can see.

      --
      Advanced users are users too!
    12. Re:The day is saved by Frnknstn · · Score: 1

      Yeah, no shit, but you didn't RTFA, and have no idea what you are talking about, dou you? The T&L functions are totaly useless by themselves. In all the cards that have T&L but no programmable shaders (eg Geforce, geforce2) the rendering pipeline is only partly exposed to the developer, and thus useless for anything but drawing 3d graphics. With programmable shaders, the complete vertex transform process is accesible, and that is what this paper talks about using.

      --
      If it's in you sig, it's in your post.
    13. Re:The day is saved by Gumber · · Score: 1
      This way we wouldn't have to keep getting a new video card every time we want to upgrade our systems 3-d performance.
      How much do you think you'd save that way? $20, maybe? GPUs are usually paired closely with the graphics memory subsystem. I'd guess that GPU + memory amounts to most of the cost of goods in higher end vid card.

      At this point there isn't much of a market for massive parallel floating point (or integer) performance for anything but 3D graphics, much to the chagrin of CPU makers, so its not clear to me what incentive GPU makers have to produce a more general purpose and more commodified product.

      I'm trying to imagine how the market might evolve from here.

      Clearly there are some computationally intensive tasks that could benefit from a more general purpose math unit, but not as general purpose as todays CPUs.

      Some of those tasks are of interest to a relatively small number of people, but those people will buy as much as they can afford. If they could add four or six big vector units per PC in a commodity cluster, they'd do it. So, while it may be a relatively small market in terms of # of customers, they are likely to buy a disproportionate number of units.

      Some of the those tasks are of interest to a much larger number of people, but those people may be served, and satisfied, with just being able to tap the lone GPU on their lone vid card for things like video or audio encoding.

      Taken together, the combination of the two markets may be big enough to be interesting to one or more GPU vendors, and both markets would benefit from a standardized general programming model.

      What is less clear is whether both markets really demand an architecture that supports multiple GPUs per system for non real-time 3d tasks. Of course, the successors to PCI and AGP may make this question moot, PCs might have suitable infrastructure for supporting multiple cards without any special effort on the part of GPU manufacturers.

      It seems likely that we'll see broader support for using GPUs for non-traditional tasks. If you think about it, the GPU makers stole a big piece of "instruction share" (% of total instructions executed in a PC) from Intel and other CPU makers, and along with it, they captured revenue growth that would have otherwise gone to CPU makers. Agreeing to some level of standardization in programming models (a la DirectX & OpenGL) and supporting the use of GPUs for non-traditional tasks will allow the GPU makers a chance at grabbing even more revenue growth away from the CPU makers.

    14. Re:The day is saved by gumbi+west · · Score: 1

      Hi, it's called an SGI. the 320 was from the days of AGP 2x and ran at approximately AGP 12x. A coworker tried to get rid of one of these and I took a look inside. There was no seperate graphics coprocessor that I could find.

    15. Re:The day is saved by Directrix1 · · Score: 2, Interesting

      I don't understand your first statement. The fact that these GPUs exist and are being used to do so many things would imply that its actually not that specialized. It just has a fat pipeline. Matrix operations are very common and many common tasks, such as web browsing even, can easily take advantage of them for image decompression and video/audio streaming. And maybe in the future if we get the whole "we don't need a dedicated coprocessor" idea out of our heads, it could be used for things like Neural Network Assistants, faster/better speech recognition, and other more complex tasks which are only not commonplace on the desktop right now because the desktop can't effectively handle it right now.

      For the positioning and cooling, well there is one in there right now. There is enough space more than likely even for more than one.

      Also, I'm not saying lets not give the sucker a cache. It would more than likely need a cache of its own dedicated memory to effectively operate just like any processor.

      When I was about 15 and I first started reading about the first GPUs, all I could think about was, "Boy is this a step in the wrong direction." I believe in hardware whose purposes are cleanly seperated. Well, the GPU thing has had its hayday, why not start making general purpose coprocessors now so every application can get a nice boost (well a lot of applications). The instructions already resemble a normal processors anyways, so why not.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    16. Re:The day is saved by Directrix1 · · Score: 1

      So you think manufacturers just want to have that extra bit tacked on top? Don't you think they would rather realize extra profit from the same product by people using for things other than 3-d graphics?

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    17. Re:The day is saved by Directrix1 · · Score: 1

      How much do you think you'd save that way? $20, maybe? GPUs are usually paired closely with the graphics memory subsystem. I'd guess that GPU + memory amounts to most of the cost of goods in higher end vid card.
      The savings would be in less complicated setup. I wasn't suggesting getting rid of its memory, CPUs have caches and these would need a pretty damn good cache of its own.

      At this point there isn't much of a market for massive parallel floating point (or integer) performance for anything but 3D graphics, much to the chagrin of CPU makers, so its not clear to me what incentive GPU makers have to produce a more general purpose and more commodified product.
      Image/Video/Audio de/compression could easily utilize it. Future desktops with AI technologies especially neural nets could easily use it and experimentation in this area would be promoted because availability of capable hardware. Also, more high quality video/audio interactions would be possible without having to settle for retrofitted extensions to the current CPU model, which was designed more for logical execution than volume throughput anyways. The market isn't here for them because nobody has shown the market its full potential. Once it gets here, it will be here to stay. If we don't try to portray the GPU as the fad that it is, then we will just be relegating ourselves into technological stagnation because we wouldn't be exploring the other areas as quickly as we could otherwise.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    18. Re:The day is saved by Anonymous Coward · · Score: 0

      There is a really good reason for not seperating these components of a video card, bandwidth and latency. The PCI bus is not going to be able to provide the same quality of service as having these chips on the same board.

    19. Re:The day is saved by notsoclever · · Score: 1
      Hey, I said nothing about the article, I was just responding to someone talking about the Voodoo2 and its shaders (or lack thereof in this case).

      Ass.

      --
      There are 10 kinds of people: ones who understand ternary, ones who don't, and ones who think this joke is about binary
  2. What?!?!?! by DarkHelmet · · Score: 5, Funny
    What? Matrix operations run faster on a massively parallel form of vector processor over a general purpose processor? How can that be?

    Intel's been telling me for years that I need faster hardware from THEM to get the job done...

    You mean........ they were lying?!?!?

    CRAP!

    --
    /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
    1. Re:What?!?!?! by Anonymous Coward · · Score: 0

      Yeah, Gerneral Purpose Units for
      Gerneral Purpose Computing.

      There's a stack of withered lettuce for you. Damn the weekend, It forces Americans to go limp. ;)

    2. Re:What?!?!?! by Anonymous Coward · · Score: 0

      There's a stack of withered lettuce for you. Damn the weekend, It forces Americans to go limp. ;)

      I don't understand what you mean. I googled for withered lettuce, but that provided nary little insight.

    3. Re:What?!?!?! by Anonymous Coward · · Score: 5, Funny

      Don't worry, the Intel processor is *much* faster at the internet thingy. Graphics cards only do the upload to screen thing, and everyone knows the internet is all about downloading.

      And besides, nobody needs or wants Matrix operations anyway. Did you see how bad Matrix Reloaded was? That was *just* reloading, imagine how bad Matrix Multiplying is. You get the idea.

    4. Re:What?!?!?! by Anonymous Coward · · Score: 0
      the Intel processor is *much* faster at the internet thingy

      How can that be? The Interweb is nothing but graphix and musick.

    5. Re:What?!?!?! by pommiekiwifruit · · Score: 1

      Also, the Intel (ding! dong ding dong dang) Quintium processor allows you to read CD-ROMs and play music; simply insert the CD-ROM into the heat-sink and the chip will read the disc. It plays music by vibrating the motherboard, allowing for dolby sound.

    6. Re:What?!?!?! by Anonymous Coward · · Score: 1, Funny

      Uh oh, your computer has performed an Illegal Operation. Quick, take it outside an bury it before the cops come!

  3. Link to previous discussion on same/similar sub... by 8282now · · Score: 5, Informative
  4. Googled HTML by balster+neb · · Score: 5, Informative

    Here's a HTML version of the PDF, thanks to Google.

    1. Re:Googled HTML by JPriest · · Score: 1

      Adobe really needs to create a seperate "lite" version of reader. That app does way more than I need it to do.

      --
      Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
    2. Re:Googled HTML by dawnsnow · · Score: 1

      You could make lighter version by removing unnecessary plug-ins. It seems my browser displays pdf documents more quickly. http://texturizer.net/firefox/faq.html#acrobat

    3. Re:Googled HTML by Anonymous Coward · · Score: 0

      Agreed. Someone needs to port that OSX PDF plugin thingy to Windows.

    4. Re:Googled HTML by RatRagout · · Score: 1

      Well, the graphs surely looks nice...

    5. Re:Googled HTML by Anonymous Coward · · Score: 0



      yer could try this thing.

      Lockergnome's (brief) writeup

      the download page


      slide

    6. Re:Googled HTML by CastrTroy · · Score: 1

      I use XPDF for linux. Starts up a lot faster than the official adobe reader and works just as well for viewing documents.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    7. Re:Googled HTML by notsoclever · · Score: 1

      What, you mean Quartz, the OSX display engine?

      --
      There are 10 kinds of people: ones who understand ternary, ones who don't, and ones who think this joke is about binary
    8. Re:Googled HTML by Anonymous Coward · · Score: 0

      But is completely unusable under a remote X session. Even simple PDFs with only text are dog slow over a 10Mbit connection. I can browse graphic heavy webpages faster on Mozilla than I can scroll through a pdf in XPDF. I think the XPDF team really need to work on the screen drawing code.

    9. Re:Googled HTML by bhtooefr · · Score: 1

      They do keep old versions around for a while. Except for eBooks (which can easily be cracked and converted to PDF), Acrobat Reader 5.1 is pretty good. However, avoid Adobe Reader 6.0 at all costs.

  5. video stuff by rexguo · · Score: 4, Interesting

    At my work place, I'm looking into using the GPUs to do video analysis. Things like cut-scene detection, generating multi-resolution versions of a video frame, applying video effects and other proprietary technologies that were previously done in CPU. The combination of pixel shaders and floating-point buffers really make GPUs a Super-SIMD machine if you know how to exploit it.

    --
    www.rexguo.com - Technologist + Designer
    1. Re:video stuff by Misagon · · Score: 1

      A few years ago I was looking at offloading the transform stage of decoding digital video to a GPU.
      The inverse wavelet transform consists basically of filtering a picture to twice its size and adding the differences from the original. This type of operation is perfect for a GPU. The type of GPU that was needed then is considered old by gamers these days. :)

      --
      "We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
    2. Re:video stuff by dddno · · Score: 1

      here is a paper, though not very recent, describing the use of GPUs to aid in realtime motion estimation. I believe the latest hardware could do miracles here, especially with a fast back channel like the PCI successor.

    3. Re:video stuff by Anonymous Coward · · Score: 1, Informative

      Pinnacle has a video editing product called Liquid Edition that uses the GPU for processing video effects & such.

    4. Re:video stuff by Qutec · · Score: 0

      Can You Imagine a Beowulf Cluster of These?

  6. As has been said many time before ... by keltor · · Score: 5, Insightful

    The GPU are very fast ... at performing vector and matrix calculations. This is the whole point. If general computing CPUs were capable of doing vector or matrix calcs very efficiently, we would probably not have GPUs.

    1. Re:As has been said many time before ... by lazy_arabica · · Score: 5, Interesting
      The GPU are very fast ... at performing vector and matrix calculations. This is the whole point. If general computing CPUs were capable of doing vector or matrix calcs very efficiently, we would probably not have GPUs.
      Yes. But 3D graphics are not the only use of these mathematical objects ; I wonder if it would be possible to use a GPU to perform video encoding or digital sound manipulation at a higher speed, as both operations require matrices. I'm also sure they could take advantage of these processors vector manipulation capabilities.
    2. Re:As has been said many time before ... by nickos · · Score: 1

      I remember hearing a really cool story about someone using the blitter in an Amiga for some really interesting non graphical process. I wish I could remember the exact purpose it was being used for, but I think it was used to filter data of some kind.

    3. Re:As has been said many time before ... by gaj · · Score: 1
      Yeah, we used to use the blitter for lots of things. Cellular automatia (e.g. Conway's game of Life) for example. The only bummer was that only Chip Memory could be used, which, depending upon which version of the Agnes chip (where the blitter lived) you had, was 2MB at most.

      Ahhh ... the good old days, when computers were actually documented well so we could hack 'em. Woudn't it be cool if today's manufacturers did the same? The Amiga Hardware Reference had detailed information about all the custom chips (Agnes, Copper and Denise, mainly) including register info, programming examples and, IIRC, scematic info.

    4. Re:As has been said many time before ... by Slack3r78 · · Score: 3, Informative

      Actually, the GeForce 6800 includes the hardware to do just that. I'm surprised no one else has mentioned it by now, as I thought it was one of the cooler features of the new NV40 chipset.

    5. Re:As has been said many time before ... by nickos · · Score: 1

      Yeah, I've got a couple of those old Addison-Wesley books here. Remember though that C= stopped documenting their custom chipset goodness when they wanted everyone to write clean API (non hardware-bashing) friendly code.

      Remember: " We made Amiga, They fucked it up "

    6. Re:As has been said many time before ... by Anonymous Coward · · Score: 3, Informative

      ATI has had this for even longer. The all-in-wonder series uses the video card to do accelerated encoding and decoding.

      Also, I believe that mplayer, the best video player/encoder I have seen also uses openGL (and thus the video card on a properly configured system) to do playback.

      Personally, I don't think there is anything really new in this article.

    7. Re:As has been said many time before ... by Slack3r78 · · Score: 1

      From the article:
      "...MPEG 2 MPEG 4 WMV9 DiVX decoding and encoding, scaling, frame rate conversion, and anything else you'd like it to do for you..."

      Both ATI and nVidia have had hardware decoding standard since the TNT2 days - it's the ability to do encoding and basically turn the GPU into a free DSP unit which is new.

    8. Re:As has been said many time before ... by Anonymous Coward · · Score: 0

      actually, the Radeons have had MPEG2 decode acceleration (motion compensation and iDCT) for a long time. A few Nvidia cards also have this capability (the FX series, and the GeForce4 MX's).

      They do not have hardware for MPEG4, WMV9, etc.

      The new ATI and Nvidia cards claim encoding support. But, I have seen no software to demonstrate this, and no spec's that say what the performance is and what resolutions are supported.

    9. Re:As has been said many time before ... by jswitte · · Score: 1

      From the article: "...MPEG 2 MPEG 4 WMV9 DiVX decoding and encoding, scaling, frame rate conversion, and anything else you'd like it to do for you..."

      Umm, I'm looking at the thomson-micro2002 pdf right now, and I can't find 'MPEG' anywhere in it? What page was this on? I just emailed the first author asking about exactly this..
      br Jim

    10. Re:As has been said many time before ... by drinkypoo · · Score: 1

      Both ATI and nVidia have included functionality to accelerate video playback for some time, but they haven't been doing MPEG decoding on-chip. ATI provided something called "motion compensation" and both of them provided video-specific scaling functions which offloaded that from the system, letting it concentrate on I/O and actually decoding the video stream. Problem is that especially with ATI cards, most players didn't make full use of the hardware, so you got stuck with one crappy player... anyway it's a digression but the point is that the hardware does very little.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  7. 178 Million in the P4EE by 2megs · · Score: 5, Insightful

    The Pentium 4 EE actually has 178 million transistors, which puts it in between ATI's and NVIDIA's latest.

    In all of this, keep in mind that there's computing and there's computing...the kind of computing power in a GPU is excellent for doing the same numeric computation to every element of a large vector or matrix, not so much for branchy decisiony type things like walking a binary tree. You wouldn't want to run a database on something structured like a GPU (or an old vector-processing Cray), but something like a simulation of weather or molecular modeliing could be perfect for it.

    The similarities of a GPU to a vector processing system bring up an interesting possibility...could Fortran see a renaissance for writing shader programs?

    1. Re:178 Million in the P4EE by Knightmare · · Score: 5, Informative

      Yes, it's true that it has that many transistors BUT, only 29 million of them are part of the core, the rest is memory. The transistor count on the video cards does not count the ram.

    2. Re:178 Million in the P4EE by DigiShaman · · Score: 1

      Ahh...but how much of the 178 million is devoted to cache and how much is devoted to core? From what I understand, GPUs don't have much if any cache as it ties directly into video memory (which serves as its cache as well).

      --
      Life is not for the lazy.
    3. Re:178 Million in the P4EE by Anonymous Coward · · Score: 0

      The writeup stated "core" transistor count - that is, not memory.

      The P4EE has 178 million transistors, yes, but the vast majority of them are the L1/2 cache. The GPUs have most of their transistors as their core, not as memory.

    4. Re:178 Million in the P4EE by Jeff+DeMaagd · · Score: 1

      The transistor count you state probably includes the L2 cache, not the core transistors of the respective GPUs and CPUs.

    5. Re:178 Million in the P4EE by LinuxGeek · · Score: 3, Informative
      If they are ignoring the cache on the P4 EE, then why mention the Extreme Edition at all? Cache size is the only difference between the Xeon based EE and a regular Northwood P4. Also, modern GPU's certainly do have cache. Read this old GeForce4 preview .
      The Light Speed Memory Architecture (LMA) that was present in the GeForce3 has been upgraded as well, with it's major advancements in what nVidia calls Quad Cache. Quad Cache includes a Vertex Cache, Primitive Cache, Texture Cache and Pixel Caches. With similar functions as caches on CPU's, these are specific, they store what exactly they say.
      Another good article has a block diagram showing the cache structures of the GeForce FX GPU. Nvidia and ATI both keep quiet about the cache sizes on their GPUs, but that dosen't mean that the full transistor count is dedicated to the processing core.
      --

      Kindness is the language which the deaf can hear and the blind can see. - Mark Twain
    6. Re:178 Million in the P4EE by alphakappa · · Score: 2, Funny

      Please ANYTHING BUT FORTRAN!!!!!!! Seriously, FORTRAN needs serious reworking to be user friendly in today's age. It was fine a decade or two ago when people were not used to user friendly languages. COBOL anyone? FORTRAN has its uses, but it's horribly, horribly tough to use if you want to combine number crunching with other stuff such as string manipulation.

      --
      "When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
    7. Re:178 Million in the P4EE by gunix · · Score: 5, Insightful

      Well, it's like UNIX, it's userfriendly, it's just selects it's friends very carefully.
      IMHO, the perfect friend is someone interested in maximum performance and knows how to program and knows something about computer hardware.

      Have you looked at fortran 90, 95 or 2000?

      --
      Evolution of Language Through The Ages: 6000 BC : ungh, grrf, booga 2000 AD : grep, awk, sed
    8. Re:178 Million in the P4EE by alphakappa · · Score: 1

      I generally agree with your viewpoint, but then you can also say that assembly is the best way to program coz you can get the tightest and fastest code, but unless you are trying to optimize code, it's a wreck to program in. Say I have some dsp code which I want to get to run under a certain mips, I would roll up my sleeves and write certain sections in assembly, but for my general program I would still use C coz it doesn't hamper me with lack of simple syntax. The same analogy applies here, if I have a huge algorithm to write, I would hate to do it in Fortran. I've seen Fortran 77 (to be fair, I haven't seen 90,95 or 2000, so I may be wrong in saying that it's not user friendly now), and it was a nightmare to program and debug. I believe the same functionality could be achieved using a better syntax, and if F95/2000 has done it, then it's a great thing to have.

      --
      "When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
    9. Re:178 Million in the P4EE by Bender_ · · Score: 3, Insightful

      The transistor count on the video cards does not count the ram

      How do you know? In fact, modern GPUs require a large amount of small scattered memory blocks. Texture caches, FIFOs for fragment/pixels/texels when they are not in sync, caches for vertex shader and pixel shader programs etc etc..

      More recent GPUs are notorious for their incredibly long latencies. Long latencies imply that a lot of data has to be stored in chip..

    10. Re:178 Million in the P4EE by Hast · · Score: 2, Informative

      Well, it's really more that the pipelines are very long. On the order of 600 pipelinestages, and that's pretty damned long. (P4 which is a CPU with a deep pipeline has 21 stages IIRC.)

      They do of course store data between those stages, and there are caches on the chip. Otherwise performance would be shot all to hell.

      I doubt that the original statement that GPU designs don't count the on chip memory is correct. That just seems like an odd way to do it.

    11. Re:178 Million in the P4EE by Danathar · · Score: 1

      So you are saying that a graphics card could be good for something like seti@home? Or am I completely wrong?

    12. Re:178 Million in the P4EE by 10Ghz · · Score: 1

      Of course GPU's have SOME cache. But they don't have whole lot of it. At least, nowhere near 2.5 megs as P4EE does!

      --
      Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.
    13. Re:178 Million in the P4EE by squiggleslash · · Score: 3, Informative
      Seriously, FORTRAN needs serious reworking to be user friendly in today's age.(...) it's horribly, horribly tough to use if you want to combine number crunching with other stuff such as string manipulation.
      Methinks you're confusing "user friendly" with "powerful". It's not that FORTRAN's string manipulation functions aren't user friendly, they're just crap.

      (For those unaware of how FORTRAN does strings, they're stored in fixed length arrays that are padded to the end with spaces. When you want to compare two strings you have to make them the same length with a "substr" type operation (eg "STRING1(1:37) .EQ. STRING2(1:37)") - it's easy to use, just too crude to be usable.)

      Saying FORTRAN isn't user friendly on the basis of its string handling is like saying Commodore BASIC 2 wasn't user friendly because of its procedures, erm, I mean subroutines. What could be hard about GOSUB and RETURN? Nothing. It's just they're not very useful.

      --
      You are not alone. This is not normal. None of this is normal.
    14. Re:178 Million in the P4EE by mc6809e · · Score: 2, Informative

      Yes, it's true that it has that many transistors BUT, only 29 million of them are part of the core, the rest is memory. The transistor count on the video cards does not count the ram.

      Sure it does, it's just that the ram isn't cache, it's mostly huge register files.

    15. Re:178 Million in the P4EE by trashme · · Score: 1

      You're missing the point.

      He is saying the transister count is not quite comparable. The GPUs transistor count is (probably?) including the on-chip caches. They then compared it to an Intel CPU with a huge cache, and then ignored the cache when giving a transistor count.

    16. Re:178 Million in the P4EE by wjwlsn · · Score: 1

      When you want to compare two strings you have to make them the same length with a "substr" type operation (eg "STRING1(1:37) .EQ. STRING2(1:37)") - it's easy to use, just too crude to be usable.)

      Incorrect. Fortran will automatically pad the shorter string with spaces to perform the comparison.

      By the way, do you see how easy it is to perform substring operations in Fortran? Since Fortran treats strings like arrays of characters, substring operations become trivially easy to perform. Combine this array-like syntax with the standard selection of intrinsic character functions, and I can do anything I want with a string in Fortran without ever using a single pointer.

      --
      Getting tired of Slashdot... moving to Usenet comp.misc for a while.
    17. Re:178 Million in the P4EE by flaming-opus · · Score: 1

      The one distinction between a vector processor and a GPU used as a vector processor is that the vector CPUs have reasonable scalar performance. Most matrix math programs are MOSTLY vector math, but with a few scalar bottlenecks. What's the latency of running a branch-heavy decission tree through the long pipeline of a GPU? How big of a program can you fit on the graphics card?

      The advantage of the GPU is that you already have it on the system. But if you really need to do this complex mathematical analysis, a DSP chip is probably of better use.

      If there were many programs that made use of simd-style math, the CPUs would all have co-processors to do that math really well. Oh look! They all do. That's what altivec / SSE / etc. are.

    18. Re:178 Million in the P4EE by Hatta · · Score: 1

      IMHO, the perfect friend is someone interested in maximum performance and knows how to program and knows something about computer hardware.

      Add a nice set of tits and I hear wedding bells.

      --
      Give me Classic Slashdot or give me death!
    19. Re:178 Million in the P4EE by Anonymous Coward · · Score: 0

      If you're referring to the DRAM on the video cards, 1) they're not part of the graphics chip (as opposed to L1/L2 integrated into the CPU die), and 2) a 128 MB card has roughly 128 * 1024 * 1024 * 8 = just over 1 billion transistors. DRAMs use 1 capacitor and 1 transistor per bit. This doesn't count the row and column decode logic, sense amps, or extra rows of bits built in case of defects.

  8. Website on this topic by Anonymous Coward · · Score: 5, Informative

    General-purpose computation using graphics hardware has been a significant topic of study for the last few years. Pointers to a lot of papers and discussion on the subject are available at: www.gpgpu.org

  9. Re:Not the Point by JonoPlop · · Score: 4, Interesting
    The whole point of graphic cards is that they have a dedicated purpose. Using the cards for anything that is general purpose is like using a motorcycle to tow a pop-up camper.

    No, it's like using your pop-up camper for storage space when you're using it on holidays.

  10. While not playing games? by pyrrhonist · · Score: 4, Funny
    What could my video card be doing for me while I am not playing the latest 3d games?

    Two words: virtual pr0n

    --
    Show me on the doll where his noodly appendage touched you.
    1. Re:While not playing games? by Anonymous Coward · · Score: 0
      virtual pr0n

      As opposed to?
    2. Re:While not playing games? by Anonymous Coward · · Score: 0

      virtual virtual p0rn

      it's just like virtual p0rn!

    3. Re:While not playing games? by Anonymous Coward · · Score: 1, Funny

      real porn uses less polygons

    4. Re:While not playing games? by Trejkaz · · Score: 1

      Has anyone done this yet? Surely with the current generation of graphics technologies this is possible. Take DOA3, remove some more clothes, and place the characters on top of each other in various poses. Then develop some animations which look right.

      Sometimes I feel like I must be the only person perverted enough to think of this stuff... but surely it must exist already.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
  11. DSP using GPUs by crushinghellhammer · · Score: 3, Interesting

    Does anybody know of pointers to papers/research pertaining to using GPUs to perform digital signal processing for, say, real-time audio? Replies would be much appreciated.

  12. PDF to HTML by Libraryman · · Score: 2, Informative

    Here is a link at Adobe where you can turn any PDF into HTML.

  13. Hacking the GPU by nihilogos · · Score: 5, Informative

    Is a course being offered at caltech since last summer on using gpus for numerical work. Course page is here.

    --
    :wq
  14. What comes next. by CherniyVolk · · Score: 5, Funny


    "Utilize the sheer computing power of your video card!"

    New market blitz, hmmmm.

    SETI ports their code, and within five days their average completed work units increase 1000 fold. 13 hours later, they have evidence of intelligent life at 30000 locations within one degree.

    Microsoft gets the hint, and comes out with a brilliant plan to utilize GPUs to speed up their OS and add bells and whistles to their UI.

    And, once again, Apple and Quartz Extreme is ignored.

    1. Re:What comes next. by Barbarian · · Score: 4, Funny

      Then they throw away the results because the gpu's are not able to calculate at double precision floating point, but only at 24 or 32 bits.

    2. Re:What comes next. by Anonymous Coward · · Score: 0

      That mod parent up sig is quite clever.

    3. Re:What comes next. by BSDKaffee · · Score: 1

      Yeah, except it says Mon, February 30 next to it.

    4. Re:What comes next. by Anonymous Coward · · Score: 0

      And it was posted at 13:37pm :)

    5. Re:What comes next. by jfern · · Score: 1

      Oops, that gives it away. February 30th is never a Monday.

    6. Re:What comes next. by Krid(O'Caign) · · Score: 1

      I'm more concerned with the fact that: A: If it were true, then that would mean that either Slashdot is sitting on stories and discussions for well over 4 months, or that somebody has a temporal ISP. B: It's above the [Reply to post | Parent] tag.

    7. Re:What comes next. by Anonymous Coward · · Score: 0

      I'm more concerned with the fact February 30th never exists to begin with

    8. Re:What comes next. by renoX · · Score: 2, Insightful

      Yes, one thing shocked me in their paper: they don't talk much about the precision they use..

      Strange because it is a big problem for using GPU as coprocessors: usually scientific computation use 64bit floats or on Intel 80-bit floats!

    9. Re:What comes next. by Anonymous Coward · · Score: 0

      Well, on your specific planet, that's true.

    10. Re:What comes next. by phsdv · · Score: 1
      Maybe you missed the announcement: NVIDIA Corporation recently introduced its new GeForce 6800 GPU (codename NV40). Among the new features of this GPU are 64-bit floating point texture filtering and blending and support for the D3D vertex and pixel shader 3.0 standard, enabling full dynamic branching and looping in programmable shaders.

      Apperently they do support 64 bits fp....

    11. Re:What comes next. by Anonymous Coward · · Score: 0

      Dude, this was a joke. I'm a nVidia fanboy at that. Lighten up.

    12. Re:What comes next. by Anonymous Coward · · Score: 0

      Its a propriatary floating point format. Not IEEE 754.

    13. Re:What comes next. by renoX · · Score: 1

      Well it is still better to have propietary 64bit floating points than true 32bit floating point I think..

      And thanks phsdv for pointing to me that the 6800 can do some 64bit calculation, I didn't know it.

      I'm really surprised that NVidia did this as 32bit should be enough precision for video rendering for a long time, even for non real time rendering, strange.
      Apparently they even support 128floating point calculation, it will be interesting to see the scientific papers which talks about using the 6800, with the PCI express bus, the GPU may be even really usefull as a math accelerator, much faster than I thought!

      Now the power consumption of several 6800 in a PC should be "interesting" to say the least..

    14. Re:What comes next. by SmackCrackandPot · · Score: 3, Informative

      64-bit floating point texture filtering and blending and support for the D3D vertex and pixel shader 3.0 standard,

      That's 64-bits for a four element vector (RGBA) or (XYZW), which is thus 16-bits per float. This is referred to as the 'half' floating point data type, as opposed to 'float' or 'double'. This is compatible with Renderman.

    15. Re:What comes next. by Anonymous Coward · · Score: 0

      John Carmack pushed for the 64bit internal calculations.

      The cards output to 32 bit colour, but if you round to 32 bits with each operation you start to lose acuracy. When he started working on Doom III he found that after 10 or 20 operations on a pixel there was visible banding.

    16. Re:What comes next. by Anonymous Coward · · Score: 0

      That's not 32 bits per component you know. 8-bit floats?? urghhh no wonder you'd see banding.

    17. Re:What comes next. by Animaether · · Score: 1

      'Half' actually doesn't have anything much to do with Renderman. And it's not just plain 16bits/channel. Perhaps you're confusing Pixar (renderman) with ILM ?

      Read more on the 'Half' datatype at OpenEXR.org

    18. Re:What comes next. by renoX · · Score: 1

      Thanks for clearing things up, in fact it is 64 or 128 bit for the total!

      We're still at 16 or 32 bit per component, which is what matters, as AFAIK there is no way to combine the component to do a floating point calculation with a precision of 64bit.

  15. It's nice, but could be nicer by Anonymous Coward · · Score: 5, Informative

    Before you get excited just remember how asymmetric the APG bus is. Those GPUs will be at much better use when we get them as 64bit pci cards.

  16. Re:Not the Point by Amiga+Lover · · Score: 4, Insightful

    The whole point of graphic cards is that they have a dedicated purpose. Using the cards for anything that is general purpose is like using a motorcycle to tow a pop-up camper.


    What's relevant is that to the processor on a graphics card, its dedicated purpose is simply a bunch of logic. There's no dedicated "this must be used for pixels only, all else is waste" logic inherent in the system. there are MANY purposes for which the same/similar logic that applies in generating 3D imagery can be used, and that seems the purpose of this paper. Run THOSE type operations on the GPU. Some things they won't be able to do well no doubt - but those they can, they can do extremely well.

  17. Not just the GPU : the RAM by ratboot · · Score: 5, Interesting

    What's interesting with new video cards it's their memory capacity, 128 or 256 MB and that this memory is accessible on some new cards at 900 MHz with a data path of 256 bit (which is a lot faster than a CPU with DDR 400 installed).

    1. Re:Not just the GPU : the RAM by Indy1 · · Score: 1

      its also a big reason why high end video cards cost so much. That uber high speed ram has a uber high price.

      --
      Lawyers, MBA's, RIAA? A jedi fears not these things!
    2. Re:Not just the GPU : the RAM by Anonymous Coward · · Score: 0

      Is there some reason why you had to use uber? You're not German, and it makes you sound like a dick. Just use 'very'. What the fuck is wrong with 'very'?

    3. Re:Not just the GPU : the RAM by Anonymous Coward · · Score: 0

      What are you, some kind of German Nazi?

    4. Re:Not just the GPU : the RAM by dasmegabyte · · Score: 1

      What other kind of Nazi is there?

      --
      Hey freaks: now you're ju
    5. Re:Not just the GPU : the RAM by mwvdlee · · Score: 1

      Seems like a good idea, though the bus to the card is probably a lot slower than a dedicated memory bus and fully utilizing the bus could be troublesome when using other extension cards it may still be nice "swap" memory, and probably faster than a swapfile on harddisk. Anybody care to do some math on this?

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    6. Re:Not just the GPU : the RAM by drinkypoo · · Score: 2, Interesting
      The part that annoys me is that it's all the same speed. Texture memory doesn't have to be near as fast as video memory and furthermore you could have two classes of texture memory, which will make sense as video cards reach and exceed 512MB. There have in the past been video cards with high speed video memory and something like EDO for textures, which makes a lot of sense, especially if you're willing to cache most-used textures somewhere in video memory.

      Is it just me, or should the cards have maybe 64 or 128MB of high speed memory, and then a couple of DIMM slots that take ordinary DDR SDRAM? That would still be pretty fast stuff, especially if the cards had dual-channel memory controllers, and plenty fast enough for textures. The card could cache the most-used textures in whatever video memory was left after drawing screens.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  18. Wow by cubicledrone · · Score: 5, Interesting

    All that processing power, and the latest games still run at about 22 frames per second, if that.

    The CPU can do six billion instructions a second, the GPU can do 18 billion, and every last cycle is being used to stuff a 40MB texture into memory faster. What a waste. Yeah, the walls are even more green and slimy. Whoop-de-fucking-do.

    Would it be great if all that processing power could be used for something other than yet-another-graphics-demo?

    Like, maybe some new and innovative gameplay?

    --
    Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
    1. Re:Wow by PitaBred · · Score: 4, Insightful

      You don't seem to understand that GPU's are very specific purpose computing devices. They aren't like a general purpose processor like you CPU. They crunch matrices, and that's about it. Even all the programmable stuff is just putting parameters on the matrix churning.

    2. Re:Wow by Anonymous Coward · · Score: 0

      What a waste.

      Ugh, when will you people SHUT UP? I'm sick of hearing this same fucking refrain every time there's a story about a new graphics technique for games, movies, or whatever. Let me give you a hint: it's not easy to come up with "new and innovative gameplay" any more than it is easy to come up with new and innovative paintings, sculptures, etc. That's the reason true innovations are few and far between, and it applies to games just as to every other creative medium we humans participate in. So stop your incessant whining about how the newest whiz-bang graphics aren't going to expand your mind or whatever, because they aren't meant to.

      Mike

    3. Re:Wow by Anonymous Coward · · Score: 0
      Yes, and no. GPUs are actually pretty damn close to being "general purpose" processors in their own right. The issue is that for some things, the GPU is significantly faster than a CPU, whilst for other things, the CPU is significantly faster. It's all about what the processor (yes, they are all processors, CPU and GPU both...) is designed to do. GPUs are designed to accelerate 3D graphics, which means lots and lots of matrix calculations. CPUs are designed to be general purpose devices.

      It's like hitching up a prime mover to your caravan. It can do it (assuming an appropriate coupling device), but it's not designed to do it. You're better off using a sedan with enough grunt. The prime mover's better used for moving massive loads -- several tonnes -- which a sedan simply can't shift in any reasonable way.

    4. Re:Wow by Tezkah · · Score: 0

      This is why I love consoles, they can't simply make a special tech demo, as that gets old in the first 6 months of new hardware, then they have 4 and a half years of creating actual good games instead of fancy effects (although some game makers dont get this hint, and things like FFX are still fancy tech demos :O )

  19. audio stuff by RobPiano · · Score: 4, Interesting

    At my work we do audio stuff. It would be really neat if I could do some of the more complicated audio analysis (FFT etc) that requires lots of vector math using the video cards gpu. There is probably even some way you could sync the timing for multimedia stuff.

    I know nothing about CPU design though

    1. Re:audio stuff by Hast · · Score: 3, Informative

      Look at gpgpu.org I believe they have papers on doing FFT on GPUs. They also have a collection on papers regarding GPU as CPUs.

    2. Re:audio stuff by zsazsa · · Score: 2, Informative

      It would be really neat if I could do some of the more complicated audio analysis (FFT etc) that requires lots of vector math using the video cards gpu.

      There's a company that actually does this. The Universal Audio UAD-1 audio DSP had a previous life as a video card and a DVD hardware accelerator. Check out this thread on the UAD forums for more technical information.

    3. Re:audio stuff by rjshields · · Score: 1

      here's a link gpgpu.org

      you can do that by typing <a href="http://gpgpu.org">gpgpu.org</a>

      --
      In this world nothing is certain but death, taxes and flawed car analogies.
    4. Re:audio stuff by Anonymous Coward · · Score: 1, Interesting

      I use Mathlab at work a lot. It runs a mathmatical simulation of a system. Lots of digital communications people do it. I wonder if a system can be converted into a matrix and then solved numerically.

    5. Re:audio stuff by baxissimo · · Score: 1

      Unfortunately many of the papers about things like FFT and multigrid on GPU actually run slower on GPU than CPU. But the argument is that the performance of GPUs is growing so rapidly (aka "Moore's law squared") that before too long they will run faster on the GPU. The new NV40/R800 class of GPUs may have already tipped the scales. The ability to now read textures from within vertex programs and branching instructions should also allow some algorithmic improvements to be made to some GPGPU techniques.

  20. Maybe that's the answer... by rsilvergun · · Score: 0, Offtopic

    to all our compatibility woes. I keep hearing about how much faster G5's and Alpha's are than x86's, but it doesn't really matter if it won't run the apps I want. Now that processors are so cheap, why not just throw an x86 in for compatibity and then start over with a better design? Kinda like what the PS2 does so it can play PS1 games (I think).

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
    1. Re:Maybe that's the answer... by John+Starks · · Score: 0, Offtopic

      This is offtopic, but...

      You keep hearing that they're faster because people don't like to admit they spent too much for inferior technology. Yes, arguably the x86 instruction set is inferior to newer, better engineered ones. But the newest offerings from Intel and AMD eclipse the G5 in speed. Apple failed to follow Spec guidelines when they released their benchmarks, thus allowing them to claim the performance crown unfairly.

      Read about it if you're unconvinced. This news has been floating around. I like Macs, but please don't spread lies for Apple. They're efficient enough without you.

      I see this a lot on Slashdot, and I try to always call people on it. I think people are just uninformed, but I also think the general Mac bias here influences it as well.

    2. Re:Maybe that's the answer... by trg83 · · Score: 3, Interesting

      From the link you mentioned: "while Apple used a compiler you've never heard of (at least in the x86 world)."

      My understanding is that they used GCC.

      Further, "Another said that some version of Linux had to be used to compare apples to apples. Well, MacOS X isn't Linux, and the desktop standard for x86 machines is Windows (not that using a properly optimized Linux bothered the Opterons very much). You want to know what machine is fastest, you test in their native environment."

      Oh, silly me. Processors are so obviously made to run only one operating system!

      I'll take this site's info with a grain of salt.

    3. Re:Maybe that's the answer... by John+Starks · · Score: 3, Insightful

      GCC is an inferior compiler for the x86, whether you like it or not. Intel's optimizing C/C++ compiler is much faster according to numerous benchmarks (I'm sorry, it's too late to find the links.) On the other hand, I understand that GCC is great on the Mac, since Apple optimized it properly. (Certainly I appreciate the hard work of the various GCC teams over the years; hopefully new optimizations will continue to improve the quality of the release until it is as fast as Intel's offerings.)

      In any case, why do you believe all of Apple's conveniently high numbers, but you don't believe Spec numbers reported by Dell, AMD, etc.? These are not numbers pulled out of a hat; they are standard Spec results. Thus, the numbers should be comparable from company to company. But Apple retested other companies' products and released new numbers without properly optimizing for the x86. Why is it when Microsoft pays for benchmarks, people freak out, but when Apple PERFORMS benchmarks, people believe them instantly?

      There are plenty of other links out there that provide similar information. It is patently false advertising for Apple to claim that they use the fastest chip of any PC.

      Oh, and re: the Linux issue, you're right. But you'll find that the x86 is faster in Linux with a proper optimizing compiler.

      My issue is basically that at best -- at best! -- the results are inconclusive. At worst, Apple blatently lied. It's foolish to believe Apple blindly just because they're the underdogs and produce a pretty, Unix-based OS. And it's foolish to hold this strange hatred for all that is x86. I don't understand this mentality.

    4. Re:Maybe that's the answer... by Trepalium · · Score: 1
      Unfortunately for apple fans, the number still stand on that site. Dell's own testing shows much better performance with the benchmark than Apple's does, and even makes the Dell machine win the benchmark. On the other hand, if they used GCC to compile the benchmark for the Dell machines, that might explain why they got such cruddy results. It's a widely accepted fact that GCC's code generation on CPUs with limited numbers of registers is pretty poor in terms of performance.

      Of course, if you don't trust that website, how about ZDNet or even compare the numbers yourself. There's Veritest's Apple numbers, versus the offical published numbers from SPEC. There's also this site which goes into detail about the benchmark. They used -ffast-math on PPC, but not on x86, for instance. They explicitly turned off hyperthreading, which obviously hurt the Dell machine during the MP tests.

      Then again, as the old saying goes, there's three types of lies. Lies, damn lies and benchmarks.

      --
      I used up all my sick days, so I'm calling in dead.
    5. Re:Maybe that's the answer... by Anonymous Coward · · Score: 0

      The point of SPEC is that vendors submit their own results.

      Apple ignored Dell's Official scores, and instead made up their own scores for Dell machines that were 2/3s of the real ones. By spec.org, Dell kicks Apple's ass, but that's not what Cokey Stevey told his minions.

      Apple's marketing here is seriously scummy in an unprecidented manner, and basically opens the door for any vendor to make up any scores for their competitors. Imagine if Dell published results for the PMac G5 showing them to be 50% slower than a Pentium. Aw you'd hate that wouldn't you?

      Not that it matters, because the only people who give a shit are already confirmed Apple Homosexuals.

    6. Re:Maybe that's the answer... by persist1 · · Score: 1

      The parent is written:

      Why is it when Microsoft pays for benchmarks, people freak out, but when Apple PERFORMS benchmarks, people believe them instantly?

      ...Uhh, maybe because Microsoft is a lot less respectful of... practically every group (out of users, developers, vendors, customers) with which it comes into contact, as compared to Apple?

      Also, when you don't obfuscate the sponsor of the test, accountability in the face of bullshit is a lot easier to determine, hmm?

      ...Not saying Apple doesn't have its moments (such as when it decides to compete with third party software titles), but even then, come on.

      --
      ...When in doubt, think for yourself.
    7. Re:Maybe that's the answer... by phatsharpie · · Score: 2, Interesting

      Actually, GCC may have optimization for the G5, but it is far from being optimal:

      The compiler that seems to be best/fully optimized for the G5 is the new IBM XL compilers, released at the beginning of the year.

      http://forums.macnn.com/showthread.php?s=&thread id =197118

      There doesn't seem to be much benchmark done using it yet, but all information points to significant gain in performance when using the IBM compiler versus GCC (not surprising, since IBM built the chip). The only benchmark I can find is from a German site:

      http://www.heise.de/ct/Redaktion/as/spec/ct04082 30 /

      I don't believe the G5 is indeed the "fastest" personal computer in the world as claimed by Apple, but it certainly is comparable to the best in the x86 world. Not to mention it is a very new architecture, and there are still plenty of optimization that can be made to make it faster. But to claim that GCC is fully optimized for the G5, and that Apple was using it to justify its claim of being the "fastest" is incorrect. It used a compiler that is arguably good, but certainly not excellent for it.

      In regards to comparing Mac OS X to Linux rather than Windows. I think the comparison is valid considering the market Apple has been targeting recently. Apple seems to have backed off from wooing the MS crowd, but instead focusing on firms that use UNIX workstations. Apple wants these companies to switch to the PowerMac rather than to a x86/Linux platform. This is highlighted by their advocacy of using OS X for biotech and film/video effects production. I remember one of their earlier OS X ad even told the reader to send all of their old UNIX boxes to "/dev/null" - or something like that.

      -B

    8. Re:Maybe that's the answer... by tomstdenis · · Score: 0

      "GCC is an inferior compiler for the x86, whether you like it or not. Intel's optimizing C/C++ compiler is much faster according to numerous benchmarks"

      This simply is not true. Like all compiler [or benchmarks] good at one thing is not good at all things.

      I found that even with profiling the Intel compiler [v8] is better at things like MP math [libtommath] but the same and worse at other crunching [ciphers/hashes in libtomcrypt].

      In fact the Intel C compiler doesn't make much use of SSE2 properly at all [hint: SSE2 is like MMX but on the 128-bit XMM registers] which can be used to speed up 32-bit ALU work [more registers to play with]. The most I've seen is it uses SSE to clear temps in memory.

      Not saying that the compiler isn't good. It's fast [to compile] and all. Just saying that GCC isn't that "lost" compared to it.

      Tom

      --
      Someday, I'll have a real sig.
    9. Re:Maybe that's the answer... by CracktownHts · · Score: 1
      Oh god, please not Ed Stroglio. That guy is a clueless idiot. I used to read that site for the cooling fetish stuff, but I had to stop because his woefully misinformed opinions, coupled with his unbelievably bad writing, made me want to claw my eyes out.

      Why don't you give us a quote from Rob Enderle instead? At least he's sort of famous.

      Thanks for dredging up some painful memories.

    10. Re:Maybe that's the answer... by trg83 · · Score: 1

      "GCC is an inferior compiler for the x86, whether you like it or not"

      If you go back and read my post, I never argued that point at all. I just think it is very hard to trust any source that shows complete naievete in regards to the GCC compiler. Great or not, it is quite prevalent in the Unix world. Saying it's a compiler you have never heard of implies lack of experience on the part of the writer.

      Furthermore, I believe all of these benchmarks are skewed. If you follow the money trail long enough, you'll always find someone with something to gain. The only suitable benchmark is personal experience. The psychology field would tell you that sometimes an inferior option seems superior to someone just because they want so much for it to be the case. In that case, if the user is satisfied/thrilled, to quote Joey from Friends, which is better is "...a moo point...Like a cow's opinion...It doesn't matter"

    11. Re:Maybe that's the answer... by HuguesT · · Score: 1

      This is like the Java apologists who come up with a silly Fibonacci benchmark to show that *in some instances* Java can be faster than C. Except that if you rewrite the C a little bit then it smokes Java again.

      In real-world applications as opposed to toy benchmarks, the Intel compiler is at best 5% faster than GCC, and yes SPEC benchmarks are toys. Hardware companies come up with special compilers for these benchmarks that basically work on nothing else. Remember that the SPEC numbers are posted by the companies, not by independent third parties.

      I personally undertook to port a large image analysis library to work well with icc after very optimistic benchmarks, and at the end the speedup was not even noticeable. This was after a lot of effort to write the programs in such a way that the compiler could detect loops that were parallelizable, etc, and this was after making sure that the compiler *was* doing the parallelization (it said so while compiling). Needless to say this was very disappointing.

      In our application all was left was to try and optimize the various algorithms, and we did get a factor-of-two speedup from that, at the expense of generality, which was all we were after.

      There is no doubt that icc optimizes better, but for general-purpose computation it is not what it's all cracked up to be. Furthermore it's not hard to find benchmarks where gcc performs actually better than icc.

      In summary, don't believe the benchmarks, do your own studies on your own software.

  21. This is BIG by macrealist · · Score: 5, Insightful

    Creating a way to use the specialize GPUs for vector processing that is not graphics related is ingenious. Like a lot of great ideas, it is sooo obvious AFTER you see some one else do it.

    Don't miss the point that this is not intended for general purpose computing. Don't port OoO to the graphics chip.

    Where it is huge is in signal processing. FPGAs have begun replacing even the G4s in this area recently because of the huge gains in speed vs. power consumption an FPGA affords. However, FPGAs are not bought and used as is, and end up costing a significant amount (of development time/money) to become useful. Being able to use these commodity GPUs for vector processing creates a very desirable price/processing power/power consumption option. If I were nVIDIA or ATI, I would be shoveling these guys money to continue their work.

    --
    I am living proof of the Peter Principle
    1. Re:This is BIG by Anonymous Coward · · Score: 0

      I'm hoping for the day when PC hardware and OSes support Reconfiguarable Hardware Devices. I think these would be a HUGE help to simualtion/rendering farms where the software vendor could provide a netlist and tell the CPU to send some long-term calculations off to the FPGAs.
      The closest we've gotten was the DEC Pamette which had 4 FPGAs on a 64-bit PCI card. The devices could be programmed using a C-like language and all the device drivers were included to get up and running fast.

    2. Re:This is BIG by Anonymous Coward · · Score: 0

      Creating a way to use the specialize GPUs for vector processing that is not graphics related is ingenious.

      Ummm, hardly.

      People have been using other processors in their computers to do things they weren't designed for for years. Like using the keyboard controller to reset the 286 to get it out of protected mode. Or using the 1541 drive's processor to do calculations on the side. You act like someone's just discovered a way to capture energy straight from the fucking sun or something.

      Headline: Man uses cold to harden water into solid form!

      macrealist: Oh wow, this is totally ingenious! Like everything, it's so obvious when you see someone else do it!

  22. Siggraph 2003 by Adam_Trask · · Score: 5, Informative
    Check out the publication list in Siggraph 2003. There is a whole section named "Computation on GPUs" (papers listed below). And the papers for Siggraph 2004 should be out shortly.

    If you have a matrix solver, there is no telling what you can do. And i remember, these papers show that the speed is faster than the matrix calculations of the same stuff using the CPU.

    # Linear Algebra Operators for GPU Implementation of Numerical Algorithms
    Jens Krüger, Rüdiger Westermann

    # Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
    Jeff Bolz, Ian Farmer, Eitan Grinspun, Peter Schröder

    # Nonlinear Optimization Framework for Image-Based Modeling on Programmable Graphics Hardware
    Karl E. Hillesland, Sergey Molinov, Radek Grzeszczuk

    1. Re:Siggraph 2003 by doktor-hladnjak · · Score: 1
      Actually, SIGGRAPH 2004 papers have been decided, although SIGGRAPH has obviously not published them yet and not everybody has their preprints up.

      Tim Rowley's SIGGRAPH 2004 Index has links to available preprints or you can go straight to the SIGGRAPH 2004 program for the official program. There's a category called "Large Meshes and GPU Programming" this year, although there might be GPU related papers in some other categories too.

  23. http://www.gpgpu.org/ is a great resource by aancsiid · · Score: 4, Interesting

    http://www.gpgpu.org/ is a great resource for general purpose graphics processor usage.

  24. here ya go by dave1g · · Score: 3, Informative

    some one else posted this...

    www.gpgpu.org

    Website on this topic (Score:0)
    by Anonymous Coward on Sunday May 09, @01:57AM (#9098550)
    General-purpose computation using graphics hardware has been a significant topic of study for the last few years. Pointers to a lot of papers and discussion on the subject are available at: www.gpgpu.org [gpgpu.org]

    1. Re:here ya go by Anonymous Coward · · Score: 1, Offtopic

      Wow. You blatently copied a +5 post, even quoting the post number, from this very same discussion, and you yourself managed to get a +5 out of it! You, sir, are spectacular.

    2. Re:here ya go by dave1g · · Score: 1

      Hello?!?! I didnt hide it, I read the whole topic, saw this guy's question, and later on some one had a post that seemd to answer his question, So I copied the whole post including its current score (which was a ZERO at the time not plus 5) and everything else Instead of making a link to it.

      I wasnt trying to rack up karma or anything, just connecting someone who had a question to some one who had an answer.

  25. Not so... by oboylet · · Score: 4, Interesting
    High-powered GPUs can make for really good general-purpose devices.

    Apple's Newton had no CPU, only a GPU that was more than adequate.

    Ideas like these are good in general. I'd like to see the industry move away from the CPU-as-chief status quo. Amigas were years ahead of their time in large part because the emphasis wasn't as much on central processing. The CPU did only what it was supposed to do -- hand out instructions to the gfx and audio subsystems.

    Hardly using a "motorcycle to tow a pop-up camper." If anything, the conventional wisdom is, "when all you have is a hammer, everything looks like a nail."

    1. Re:Not so... by Anonymous Coward · · Score: 2, Informative

      Hmm. My Newton has a "160Mhz StrongARM SA-110 RISC Processor". Doesn't sound like a GPU to me.

    2. Re:Not so... by Anonymous Coward · · Score: 0

      Get back under your bridge, stupid troll.

      How is an ARM a GPU?

    3. Re:Not so... by Anonymous Coward · · Score: 0

      The "original" (read in house) newton ran on AT&T Hobbit CPUs, actually multiple Hobbits... After the original design got scrapped, Apple Changed to a single CPU.

      As far as I know Apple never installed a GPU on a Netwon.

      Alphaseinor

  26. and a sourceforge project too by Lord+Prox · · Score: 4, Informative

    BrookGPU
    from the BrookGPU website...
    As the programmability and performance of modern GPUs continues to increase, many researchers are looking to graphics hardware to solve problems previously performed on general purpose CPUs. In many cases, performing general purpose computation on graphics hardware can provide a significant advantage over implementations on traditional CPUs. However, if GPUs are to become a powerful processing resource, it is important to establish the correct abstraction of the hardware; this will encourage efficient application design as well as an optimizable interface for hardware designers.

    From what I understand this project it aimed at making an abstraction layer for GUP hardware so writing code to run on it is easier and standardsied.

    1. Re:and a sourceforge project too by WinterpegCanuck · · Score: 2, Interesting

      What about a general abstraction layer at the OS level? I am by no means at that programming level, but could you not have calculations that are proven to run good on GPU's (int's maybe?) be redirected by the OS, and the rest just sent to the CPU as normal? To me this would take advantage for all programs (except the games that want exclusive GPU use) running on the system instead of only those coded to take advantage of it. I know a few programs in the oil industry that could use all the bogomips they could get.

    2. Re:and a sourceforge project too by spectral · · Score: 1

      From what I understand, if the OS had that much control over every program's execution path, computers would run impossibly slow. The current method is to say "Ok, I'm the master program, I can do anything. Here, run this program, they can only do this stuff (or, conversely.. they can't do this other stuff). Call me back when they're done, or after 100ms have passed."

      This way the programs are protected from each other, can't do anything stupid unless the OS lets them, but still run at 'native' speed. If the OS was running future predictive decisions as to which processor was getting a specific chunk of code, that'd mean it's running that decision on every bit of code sent through it. Also, the setup costs for a GPU operation, and then the read back of the result, would kill this for 'general' computing. Only if there's a MASSIVE amount of data.

      This would technically be possible in virtualized systems like .net or java, but still impractical for general use. Compilers could also be written to take advantage of this, but again: why bother? The benefits, while great, aren't great enough to enough people that compiler writers would want to spend time on that instead of making normal things run better (like unrolling loops and using SIMD, more standards compliance, etc.)

  27. So when do we get unified memory? by Anonymous Coward · · Score: 2, Interesting

    Many of the problems stated in using a GPU for non-graphics tasks would be implicitly solved if the GPU and CPU shared memory. While this would slightly slow down the GPU's memory access, in 3 years, I don't think that would be an issue. Especially compared to the benefits of having only one memory pool.

    1. Re:So when do we get unified memory? by Dr.+Sp0ng · · Score: 1

      Many of the problems stated in using a GPU for non-graphics tasks would be implicitly solved if the GPU and CPU shared memory. While this would slightly slow down the GPU's memory access, in 3 years, I don't think that would be an issue. Especially compared to the benefits of having only one memory pool.

      Yeah, that's why Microsoft did that in the Xbox. Works real nice - for whatever operation you happen to be doing at the moment, you can choose between the CPU and GPU without copying memory around.

    2. Re:So when do we get unified memory? by pe1chl · · Score: 1

      Why would that not be an issue in 3 years?
      Some onboard video solutions that have been on the market for several years use this solution. However, it always incurs a performance penalty.

      I don't think that situation will improve because current CPUs are starved by memory bandwidth and clockrates are going up all the time. Memory bandwidth is increasing, but not as quickly as CPU clock.

    3. Re:So when do we get unified memory? by maxume · · Score: 1
      However, it always incurs a performance penalty.

      This is a rather interesting way to look at it. Presumably, it generates a rather nice boost in price/performance, or it would not be in active use?

      Your statement is similar to saying that putting a smaller engine in a car incurs a performance penalty. Obviously it does, but when you look at the overall impact(i.e. include cost) of the design decision, it might be a huge benefit...

      --
      Nerd rage is the funniest rage.
    4. Re:So when do we get unified memory? by pe1chl · · Score: 1

      There are advantages when considering parts count that are important to a-brand office pc manufacturers.
      However, the use of main memory as video memory puts a serious load on the main memory bandwidth for the refresh of the video.
      Don't under-estimate this. To refresh a 1024x768 screen at 85Hz and 24bpp (typical values for office systems) you need to transfer about 190 Mbyte/s.
      For a 1280x1024 resolution this increases to 318 Mbyte/s.
      Any bandwidth used for GPU operations adds to this.

    5. Re:So when do we get unified memory? by roie_m · · Score: 1

      I don't know the first thing about processors, so thi smay be a stupid question, but why not just go the whole way and put the GPU right on the CPU, like they did with FPUs (or MMX, etc.)? I.e., just add matrix-processing to the CPU's main instruction set.

  28. I can see it now.... by TypoNAM · · Score: 3, Interesting

    ...Several indies and companies figure out how to use the powerful GPU's in an efficient manner that would benefit everyone who uses computers on a daily basis and improves the usefulness of the computer making it the best thing in the world again then some greedy bastard comes along flashing his granted patent by the U.S. Patent Office which makes us all screwed...

    Ohh well the idea was good while it lasted. ;)

    --
    This space is not for rent.
    1. Re:I can see it now.... by Anonymous Coward · · Score: 0

      Fire Rumsfeld! Fire Rumsfeld!

  29. Re:Not the Point by Anonymous Coward · · Score: 0

    The whole point of graphic cards is that they have a dedicated purpose. Using the cards for anything that is general purpose is like using a motorcycle to tow a pop-up camper.

    Or using the intel 8086 drive controler as a general purpose cpu?

  30. Imagine... by rokzy · · Score: 4, Interesting

    a beowulf cluster of them.

    seriously, we have a 16 node beowulf cluster and each node has an unnecessarily good graphics card in them. a lot of the calculations are matrix-based e.g. several variables each 1xthousands (1D) or hundredsxhundreds (2D).

    how feasible and worthwhile do you think it would be to tap into the extra processing power?

    1. Re:Imagine... by BiggerIsBetter · · Score: 2, Interesting

      It's a good idea if your datasets take a long enough time to process. You could run 6 or so cards (maybe 1 AGP super fast, 5 PCI slowish (eg FX5200)) in your machine and send a dataset to each GPU and the main CPU, then get the results back. The trick is to keep them working without blowing all your bandwidth or PSU. Also depends on the resolution required, because the GPU is only 32 bits FP, compared to 80 bits for the CPU.

      All I can suggest is download the Brook libraries and try it out. See if it helps, and see if the results are accurate enough. And yes, Fortran can be used if you can bind it - Intel's compiler suite worked for me.

      --
      Forget thrust, drag, lift and weight. Airplanes fly because of money.
    2. Re:Imagine... by PitaBred · · Score: 1

      translate your matrix tweaking into OpenGL calls and try it out. I've heard very good things, if you can deal with the limited precision. In C/C++ parlance, it'll do float processing, not double precision, but many times simple floating point is all you need.

  31. When... by alexandre · · Score: 2, Insightful

    ...will someone finally port john the ripper to a new video card's graphical pipeline? :)

  32. How will they converge? by Thinkit4 · · Score: 1

    Anybody can see that this is all coming together someday. What is needed is a way to change the circuitry to approach whatever n-bit problem you need solved. Graphics is around 80h bit. Sound might be sixteen bit.

    The future should be more elegant and flexible. Drop your precision and instantly gain speed. We'll wonder why we dealt with graphics drivers and other such complications.

    --
    -I am an elective eunuch.
  33. Not the Point-headbanger. by Anonymous Coward · · Score: 1, Insightful

    There is however one thing to keep in mind. Presently our GPU's may have the headroom to play with, but with Apple's Quartz, and Microsoft's Longhorn, let alone what's coming with X. That headroom may disappear, and our video cards will have to go back to being video cards.

    1. Re:Not the Point-headbanger. by Amiga+Lover · · Score: 4, Insightful

      There is however one thing to keep in mind. Presently our GPU's may have the headroom to play with, but with Apple's Quartz, and Microsoft's Longhorn, let alone what's coming with X. That headroom may disappear, and our video cards will have to go back to being video cards.

      On those operating systems that require them, that could very well be.

      Still makes a nice thought that a linux box without even X installed, but a kickass graphics card, could crunch away doing something 4 times quicker than any windowed machine.

  34. Altivec by ensignyu · · Score: 1, Interesting

    I'm curious how GPUs stack up against the Altivec engine in G4/G5s.

    1. Re:Altivec by John+Starks · · Score: 2, Informative

      I would guess the difference would be comparable. Altivec is no more impressive than the SSE/SSE2/etc. types of instructions of the modern x86.

    2. Re:Altivec by Anonymous Coward · · Score: 1, Insightful

      I disagree. Performances of Altivec-aware apps of heavyly vectorized computation shows that they beat those of similar apps on the Wintel side over and over, even at higher MHz (although not as much as Apple claims). I know that other factors such as optimization, the non-vector code, etc. influence the outcome, but in the absence of true vector computation benchmarking, I can accept that Altivec is better than SSE2.

      Now, compared to GPUs, I think SIMD instructions suck. Why do you think 3D games utilize GPUs than Altivec or SSE2? In general, you can't compare the performance a part of a general utility chip to a specifically designed chip tuned to gain the highest performance without having to worry about trade-offs.

    3. Re:Altivec by gumbi+west · · Score: 1

      Using a program that calculates fractals and tells you exactly how many flops it gets I'm about to get about 3.8 flops/Hz (realized) all my G4 computers (I don't own a G5). Other non demonstrative programs exist, but do not tell you exactly what speed they are going at.

  35. Pseudo repost by grape+jelly · · Score: 4, Informative

    I thought this looked familiar:

    http://developers.slashdot.org/developers/03/12/21 /169200.shtml?tid=152&tid=185

    At least, I would imagine most of the comments would be the same or similar....

  36. Apple's getting there by blackula · · Score: 0, Redundant

    Apple, innovative as always, is already making headway. Not in the way that paper describes, per se, but in other ways.

    They, of course, designed the first OS that takes advantage of the user's GPU in situations other than CAD and games; in situations for general purpose computing. The OS uses the GPU to render the UI; obvious sounding at first glance, but revolutionary in practice.

    Small steps though they may be, Apple is, as seemingly always is the case, ahead of the game.

  37. Finally by Pan+T.+Hose · · Score: 5, Funny

    Using GPUs For General-Purpose Computing

    I'm glad that finally they started to use the General-Purpose Unit. What took them so long?

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
    1. Re:Finally by Knightmare · · Score: 1

      What do you expect, he's a phd, you can't expect him to be based in the real world...

    2. Re:Finally by Anonymous Coward · · Score: 0

      Eye Gradated 4th gread, Ey rael whirld.

    3. Re:Finally by Laebshade · · Score: 1

      If you want to get picky and nasty, I can pick on you too. It's actually "Graphical Processing Unit", like "Central Processing Unit".

  38. Obligatory joke by Anonymous Coward · · Score: 0

    In Soviet Russia CPU outperforms GPU by factor 3.2x.
    ;-)

    1. Re:Obligatory joke by Anonymous Coward · · Score: 0

      I'm rolling on the floor at the moment. Would you believe?

    2. Re:Obligatory joke by SmackCrackandPot · · Score: 1

      In the early days of 3D accelerator cards, CPU clock speeds were increasing at such a rate, that a last seasons card was referred to as a "graphics deaccelerator", with S3 probably being the most quoted.

  39. Maybe time for a new generation of math-processor? by Anonymous Coward · · Score: 4, Insightful

    Remember the co-processors? Well, actually I don't (I'm a tad to young). But I know about them.

    Maybe it's time to start making co-processing add-on cards for advanced operations such as matrix mults and other operations that can be done in parallell on a low level. Add to that a couple of hundred megs of RAM and you have a neat little helper when raytracing etc. You could easily emulate the cards if you didn't have them (or needed them). The branchy nature of the program itself would not affect the performance of the co-processor since it should only be used for calculations.

    I for one would like to see this.

  40. Re:Not the Point by kfg · · Score: 5, Funny

    Dude, you obviously have never tried to sleep in a motorcycle.

    KFG

  41. Documentation by Detritus · · Score: 2, Interesting

    Do any of the video chip manufacturers make free and complete documentation available for their GPUs? Everything that I have read in the past has said that they are encumbered with NDAs and claims of trade secrets. I'd prefer not to waste my time dealing with companies that treat their customers as potential enemies.

    --
    Mea navis aericumbens anguillis abundat
    1. Re:Documentation by Hast · · Score: 1

      No they are not. But you can access the stuff you need through the shader units which are documented. Eg if you use ARB_2 for OpenGL you can use the programs for both ATI and nVidia cards (which support shaders 2.0 or higher).

  42. Frogger by BiggerIsBetter · · Score: 4, Interesting

    Some dude wrote Frogger almost entirely in pixel shaders. http://www.beyond3d.com/articles/shadercomp/result s/ (2nd from the bottom).

    --
    Forget thrust, drag, lift and weight. Airplanes fly because of money.
    1. Re:Frogger by jay-be-em · · Score: 0

      Wow, the game logic is actually implemented by the shaders which are passed key presses...

      That's really f'n cool :)

      --
      "Orthodoxy means not thinking--not needing to think. Orthodoxy is unconsciousness." --Eric Blair
  43. It doesn't have to be asked.... by rusty0101 · · Score: 1

    ... but hopefully the 'overrated' mods won't act as double negatives....

    With all the talk about how the GPU's are so great at Matrix calculations, the question should not be "What can my GPU do when it's otherwise 'Idle'?", but "What is the Matrix?"

    -Rusty

    --
    You never know...
    1. Re:It doesn't have to be asked.... by Anonymous Coward · · Score: 0

      You know, that could actually be what the cards are doing when they're idle! Rendering the Matrix!

      A glitch is whenever you put a GeForce and an SB Live in an AMD machine. For anyone who's been there..

  44. Bass Ackwards? by Anonymous Coward · · Score: 5, Insightful

    Perhaps offloading the CPU to the GPU is the wrong way to look at things? With the apparently imminent arrival of commodity (low power) multi-CPU chips, maybe we should be considering what we need to add to perform graphics more efficiently (ala MMX et al)?

    While it's true that general purpose hardware will never perform as well as or as efficiently as a design specifically targeted to the task (or at least it better not), it is also equally as true that eventually general purpose/commodity hardware will achieve a price-performance point where it is more than "good enough" for majority.

  45. Re:It's Beer + 12:00 thirty by Anonymous Coward · · Score: 0

    thought you said you were out of here? Liar...

    sigh.. no woman at all

  46. SETI by ryanw · · Score: 1, Interesting

    I would what seti could do by the extra cycles in parallel with the CPU. Is it possible to get 2x or 3x the crunching of data for seti clients?

  47. Re:It's Beer + 12:00 thirty by Anonymous Coward · · Score: 0

    I think most dinks would be decimated by a .22 pistol blast. A .22 is nothing to scoff at really.

  48. Violation of Compartmentalization by BlakeB395 · · Score: 2, Insightful

    From a design standpoint, I can imagine a GPU that donates its power to the CPU would be a nightmare. It violates the fundamental tenet that everything should do one thing and do it well. OTOH, that tenet focuses on simplicity and maintainability over performance. Is such a tradeoff worth it?

    1. Re:Violation of Compartmentalization by anti-NAT · · Score: 1

      It violates the fundamental tenet that everything should do one thing and do it well.

      I don't necessarily think it does. The GPUs are very good at certain data processing operations. The only application these operations has recently been applied to has been 3D games or real time graphics. This just widens the scope of what the opimisation is being used for.

      As an analogy, tractors were pretty much designed to pull a plough, which was the immediate problem being solved. However, the real need was to have a device that can "pull a lot of weight at a reasonably slow and controllable speed".

      I've seen tractors being used to launch large surf lifesaving row boats at the beach. Obviously tractors weren't originally designed for that use, however, the real need ("pull a lot of weight at a reasonable slow and controllable speed") is exactly the same.

      I don't think, in the case of a GPU or a tractor, using it to solve other problems with similar or the same needs is contrary to the compartmentalisation tenet.

      --
      The Internet's nature is peer to peer - 20050301_cs_profs.pdf
    2. Re:Violation of Compartmentalization by evilviper · · Score: 3, Insightful
      It violates the fundamental tenet that everything should do one thing and do it well.

      No, having a CPU that does everything is what violates the tenet.

      I don't know about you, but I don't have a chip that does my video processing for me, I don't have a chip that does all the encryption for me, I don't have a chip that handles (en/de)capsulating network traffic, as well as handing interrupts and routing.

      Having a second processor that does some specialized work that a CPU isn't good at is an improvement, not a nightmare. I'd love to be able to plug in a chips or two into my PC and have them do better-than realtime MPEG-4 encoding that doesn't affect my processor at all... Who wouldn't?
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    3. Re:Violation of Compartmentalization by Deliveranc3 · · Score: 1

      Um we're dealing with people with powerful graphics cards.

      To them Tim Taylor is a power chasing panzy. OUGH OUGH OUGH

    4. Re:Violation of Compartmentalization by BoneFlower · · Score: 1

      For some applications it will certainly be useful, for others it won't be.

      I've been known to use 'cat' as a text editor(writer? the editing facilities of cat kinda suck) at times. In some(rare) instances, it can get the job done very quickly for that use.

      If there are other tasks that need the sort of processing applicable to 3D graphics, then using the GPU is not a problem, though the programming I imagine might be a little complicated.

    5. Re:Violation of Compartmentalization by be-fan · · Score: 1

      Since when is that a fundemental principle? It's a good design rule of thumb, but when you can get better characteristics from some other configuration, then its a good engineering decision to use that other configuration.

      --
      A deep unwavering belief is a sure sign you're missing something...
    6. Re:Violation of Compartmentalization by dasmegabyte · · Score: 1

      I'd love to be able to plug in a chips or two into my PC and have them do better-than realtime MPEG-4 encoding that doesn't affect my processor at all... Who wouldn't?

      Somebody who thought that buying one chip for $200 was better than buying two for $150?

      I mean, that's why the industry moved to a one chip solution in the first place.

      --
      Hey freaks: now you're ju
    7. Re:Violation of Compartmentalization by evilviper · · Score: 1
      Somebody who thought that buying one chip for $200 was better than buying two for $150?

      Yes, but it's not as if the two options produce the same results...

      The current system of spending $200 more to get a top-of-the-line x86 CPU, while a custom $100 processor would do the job 10Xs faster is getting ridiculous.

      In the world of encryption, people have figured this out... $100 on a PCI card, and you can do super-fast encryption. VIA is now tacking some of that functionality into their processors as well, giving their slow CPUs AES encryption capability that will outperform the fastest Intel/AMD processors by several times.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  49. Re:Link to previous discussion on same/similar sub by hype7 · · Score: 4, Interesting

    There's some good stuff in there.

    However, it seems a few organisations have actually beaten us to it.

    Apple, for example, uses the 3d aspect of the GPU to accelerate its 2d compositing system with quartz extreme. Microsoft, as usual, announced the feature after Apple shipped it, and with any luck Windows users might have it by 2007

    -- james

  50. Daisy-chaining the world. by Anonymous Coward · · Score: 0

    Long live the Transputer. Or I'll settle for just a lowly DSP, hiding out in a sound-card, or hard drive.

  51. Apple's getting there...Workstations. by Anonymous Coward · · Score: 0

    "Small steps though they may be, Apple is, as seemingly always is the case, ahead of the game."

    The guys at Next would agree with you.

  52. Re:It's Beer + 12:00 thirty by Anhaedra · · Score: 0

    It is when you are holding a .45 revolver...

    --
    Please flee in terror in an orderly manner.
  53. 3dNOW! by Anonymous Coward · · Score: 0

    There have been studies of the 3dNOW! capabilities of AMD processors in just such a capacity.

  54. Unused computing Power? by JLang22 · · Score: 1, Interesting

    I am a novice in a lot of these discussions so I don't post much. Let me see if I understand this:

    The graphics card has a lot of unused computing power, nearly equal to the main processor chip in the computer if not more, that is not being used when there is no game or video being played, right?

    Is there no way to tap into this power?

    Perhaps it could be used for the main display on the computer (I think you guys call it GUI?)?

    What else could it be used for?

    Could Linux be modified to make use of this power?

    Just a know nothing, nobody with questions.
    J

    1. Re:Unused computing Power? by PitaBred · · Score: 4, Insightful

      Lemme try to help:
      a) Not equal. Apples and oranges. A GPU will do repeated calculations very, very fast, like matrix transforms and the like. A CPU on the other hand will make decisions based on input, rather than just crunching numbers
      b) The main display (the GUI) already uses many tricks on the graphics card. The hard part is making sure that all graphics cards support the features. Things like the xrender extension and such are becoming more common as graphics cards and drivers get "standard" capabilities
      c) Your imagination is the limit as to what it could be used for. Just realize that it's a good data processing unit, not a good program execution unit. Use each for their strengths.
      d) Modified? With new cards/drivers, all it takes is OpenGL calls to start taking advantage of this power. All it really takes is someone who knows what they're doing and has a bit of inspiration.

    2. Re:Unused computing Power? by NanoGator · · Score: 3, Informative

      "The graphics card has a lot of unused computing power, nearly equal to the main processor chip in the computer if not more, that is not being used when there is no game or video being played, right?"

      Longhorn is suppossed to offload a lot of the GUI stuff to the card. So yeah, it'd take advantage of untapped power of the card. However, as for other general purpose stuff, it wouldn't be so interesting. It's kinda like comparing a Ferrari to a school bus. The Ferrari will run circles around the bus, but can only ferry 2 people. The bus can move a LOT of cargo, but not as fast as the Ferrari. We're talking about specialization here. The trick is to find ways to take what the GPU is good at and making them useful.

      --
      "Derp de derp."
    3. Re:Unused computing Power? by JLang22 · · Score: 1

      OK, I think I am starting to understand:
      a) The CPU decides things and the GPU does the math.
      b) The GUI uses some of the math power in the GPU.
      c) Use them for what they are good at.

      Then, it just takes someone to develop or modify existing programs to take advantage of both the CPU and GPU for the parts of the program that each of these handle the best. Right?

      I think I am learning, thanks.

    4. Re:Unused computing Power? by Hast · · Score: 1

      To confuse things a bit more the newest 3D cards from nVidia actually has some flow control. It's not quite as flexible as that found on a CPU though.

      The thing is that there are not all that many standard programs which would benefit from using the GPU. Requirements are basically that you should have large data sets and do repetetive calculations. Good examples are scientific calculations and graphics.

      I think the a good common use would be image programs such as Photoshop.

  55. GPU = by greppling · · Score: 4, Funny

    Now I finally understand that acronym: General purpose unit!

  56. Cool! by Russellkhan · · Score: 1

    Now I can make a PDF from the HTML version of the PDF document that grandparent hunted down on Google!

    Why do we want to do this again?

    --
    Information doesn't want to be anthropomorphized anymore.
  57. AGP read latency not important when not real time. by anti-NAT · · Score: 2, Insightful

    These applications are not likely to generate or process data at such a rate that the slow AGP read speed will matter that much, if at all.

    --
    The Internet's nature is peer to peer - 20050301_cs_profs.pdf
  58. video stuff-Offload. by Anonymous Coward · · Score: 0

    I have a dedicated video processing card that uses a sparc core as part of one of it's chips. The rest is basically a DSP, with a video mixer/ switcher on board. So basically video offloading has been around for years, but only recently affordable in the prosumer space. No need to hack a GPU to get results.

  59. Re:Maybe time for a new generation of math-process by Anonymous Coward · · Score: 0

    That would often waste resources. If you needed a 3x3 multiply and the hardware only supported 4x4, it ends up doing roughly twice the work needed.

  60. Re:Maybe time for a new generation of math-process by BlueJay465 · · Score: 4, Informative

    Well they already make DSP cards for audio processing. Simply do a google(TM) search for "DSP card" and you will get several vendors.

    I can't imagine it would take a whole lot to hack them for just their processing power outside of audio applications.

  61. ogg/vorbis encoding? by Anonymous Coward · · Score: 0

    hi!

    with all this fft & wavelet talk, does anyone know of a gpu version of oggenc?

  62. transistor counts through the ages by nothings · · Score: 5, Informative
    Transistor counts keep growing, so I keep updating this and reposting it about once a year.

    486 : 1.2 million transistors
    Pentium : 3 million transistors
    Pentium Pro : 5.5 million transistors
    Pentium 2 : 7.5 million transistors
    Nvidia TNT2 : 9 million transistors
    Alpha 21164 : 9.3 million (1994)
    Alpha 21264 : 15.2 million (1998)
    Geforce 256 : 23 million transistors
    Pentium 3 : 28 million transistors
    Pentium 4 : 42 million transistors
    P4 Northwood : 55 million transistors
    GeForce 3 : 57 million transistors
    GeForce 4 : 63 million transistors
    Radeon 9700 : 110 million transistors
    GeForce FX : 125 million transistors
    P4 Prescott : 125 million transistors
    Radeon X800 : 160 million transistors
    P4 EE : 178 million transistors
    GeForce 6800 : 220 million transistors
    here's the non-sucky version since <ecode> doesn't actually preserve spacing like <pre>.
    1. Re:transistor counts through the ages by Dominic_Mazzoni · · Score: 1

      It would be great if you could add PowerPC and UltraSparc processors to that list.

    2. Re: transistor counts through the ages by Black+Parrot · · Score: 2, Informative


      > Transistor counts keep growing, so I keep updating this and reposting it about once a year.

      For those who don't already know, what we now think of as "Moore's Law" was originally a statement about the rate of growth in the number of transistors on a chip, not about CPU speed.

      --
      Sheesh, evil *and* a jerk. -- Jade
    3. Re:transistor counts through the ages by Anonymous Coward · · Score: 0

      maybe add a couple AMD chips after the designs started to really diverge from Intel designs, also having the years listed for all the items would be a good addition.

    4. Re:transistor counts through the ages by logic-gate · · Score: 1

      It's not the size if your transistors but what you do with them that counts.

    5. Re:transistor counts through the ages by SmackCrackandPot · · Score: 1

      You could try adding .'s and using <tt>

      486 . . . . . : 1.2 million transistors
      Pentium . . . : 3 million transistors
      Pentium Pro . : 5.5 million transistors
      Pentium 2 . . : 7.5 million transistors
      Nvidia TNT2 . : 9 million transistors
      Alpha 21164 . : 9.3 million (1994)
      Alpha 21264 . : 15.2 million (1998)
      Geforce 256 . : 23 million transistors
      Pentium 3 . . : 28 million transistors
      Pentium 4 . . : 42 million transistors
      P4 Northwood .: 55 million transistors
      GeForce 3 . . : 57 million transistors
      GeForce 4 . . : 63 million transistors
      Radeon 9700 . : 110 million transistors
      GeForce FX. . : 125 million transistors
      P4 Prescott . : 125 million transistors
      Radeon X800 . : 160 million transistors
      P4 EE . . . . : 178 million transistors
      GeForce 6800. : 220 million transistors

    6. Re:transistor counts through the ages by Tomster · · Score: 1

      Nice. Here are a few more entries to bring you down to the origins of the x86 line:

      Jun 1978 8086 . . .: 29000
      Feb 1982 80286 . . : 134000
      Oct 1985 386DX . . : 275000

  63. I think I speak for many of us by Sycraft-fu · · Score: 5, Insightful

    When I say oh shut the fuck up.

    Sorry for the flames, but seriously, I get so damn sick of all the "all new games suck" whiners. Look, there are legit reasons to want new technology. It is nice to have better graphics, more realistic sound, etc. It is NICE to have game that looks and sounds more like reality. Yes, that doesn't make the game great, but that doesn't mean it's worthless.

    What's more, don't pretend like all modern games suck while old games ruled. That's a bunch of bullshit. Sure, there are plenty of modern games that suck, but guess what? There are tons of old games that suck too. Thing is, you just tend to forget about them. You remember the greats that you enjoyed or heard about, the ones that helped shape gaming today. You forget all the utter shit that was released, just as is released today.

    So get off it. If you don't like nice graphics, fine. Stick with old games, no one is forcing you to upgrade. But don't pretend like there is no reason to want better graphics in games.

    1. Re:I think I speak for many of us by Tim+C · · Score: 5, Insightful

      Hear, hear.

      There's something that's always puzzled me a little about this site - attached to every single article about some new piece of PC tech - a faster processor, better graphics card, etc - there are a number of comments bemoaning the advance. All of them saying that people don't need the power/speed they have already, that they personally are just fine with 4 year old hardware, or, in this case, that better graphics don't make for better games. Hell, the same is true for mobile phones - I've lost count of the number of comments bemoaning advances in them, too.

      It's funny, but I thought this was supposed to be a site for geeks; aren't geeks supposed to *like* newer, better toys?

      To get back on topic - no, better graphics are not sufficient for a better game. However, if the gameplay is there, then they can certainly make the experience more enjoyable. Would Quake have been as much fun if it was rendered in wireframes?

      Better graphics help add to the sense of realisim, making the game a more immersive experience. The whole point of the majority of games is entertainment and (to an extent) escapism. Additionally, what a lot of people like the grand-parent poster seem to forget is that most of the big-name game engines are licensed for use in a number of games. Let people like id spend their time and money coming up with the most graphically intensive, realistic engine they can. Think Doom 3'll suck because the gameplay will be crap? Fine, then wait for someone to license the engine and create a better game with it. In the meantime, please shut up and remember that there are those of us who like things to be pretty, as well as useful/well made/fun/(good at $primaryPurpose)

      Good graphics on their own won't make a good game, but they will help make a good game great.

    2. Re:I think I speak for many of us by Anonymous Coward · · Score: 0

      Given enough time, there will be games that will use the equivalent computational power of the fastest super computer available today for some secondary task like rendering the clouds or the waves on a beach. And it'll be done because it'll give you something nice to look at while waiting for a respawn.

    3. Re:I think I speak for many of us by Anonymous Coward · · Score: 0

      It's funny, but I thought this was supposed to be a site for geeks; aren't geeks supposed to *like* newer, better toys?

      However, Slashdot is not restricted to such people... no?

    4. Re:I think I speak for many of us by Entropius · · Score: 1

      Grandparent has a valid point.

      Computers are a bucketload faster now than they were five years ago... and games look a bucketload better. This is a good thing.

      I only want to see more advances in how to use all this power for AI/complex worlds as well as advances in graphics. The development of pretty pictures has outstripped the development of smart NPC's for quite a while; NWN looks a hell of a lot better than Nethack, but the critters aren't much smarter (and in some cases are dumber).

      Nobody's criticizing modern games, other than to say "There are lots of neat things that could be done with all this processing power that haven't been done yet."

    5. Re:I think I speak for many of us by Unregistered · · Score: 1

      It's jealousy. People see articles about technology they can't afford and say it sucks becasue they can't have it.

    6. Re:I think I speak for many of us by ducomputergeek · · Score: 1
      Reason why we bitch at cell phones and features is honestly, how many times have I fired up the brower to surf the net? 0 in 2 phone over the last 4 years. Yet my service hasn't improved in terms of quality of calls and still get quite a few dropped calls. And I've been with 3 different companies and yet no real difference in terms of quality of call and dropped calls. This is what the mass public wants. I mean being able to chat with AIM or MSN users is neat, but really 90% of us really only want clear calls that aren't dropped 30% of the time.

      People here on this site are primarily that other 10% that wants the latest and greatest geeky tech stuff. Now as I get older, I no longer play as many games or as much. Most I play is an occastional EA hockey game if I am over with the guys at one of their houses with an X-Box.

      Honestly most people hear forget that over 90% of the mass general public could care less about the texture mapping of games. They use their computer for email, internet, office type apps (word processor and maybe spreadsheet), Quicken, and solitare. I think my dad's old 486 handled most of those tasks okay. A lot of people here forget that most people truely are not like the geeks and nerds and do not care. And I must be turning into one of those "normal" people as I get into my upper 20's now because I had a palm pilot, went back to pen and paper to schedule my meetings, I haven't purchased a new game since 2000. I don't own a Playstation 2 or X-Box and my G3 iBook seems to handle everything I need at home just fine.

      Back to the topic, using GPU for vector math would be great in the industry I am now in (animation/rendering). Company I've working as a consultant ditched their Alpha boxes and propitary rendering software for off the shelf IBM Blade servers and both Lightwave and Maya for the rendering bit. They did a sample rendering test and the freaking quad Alpha 500 boxes still edged out a single dual 2Ghz Xeon blade. However now they have 28 processors in the about the same space as an old quad Alpha box plus reduced the size of the server room by 35% and added space for an additional 5 employees and we now used their workstations to process renders when they are gone in the evenings. Imagine what we could do if we could tap their latest and greatest vid card as well to do their renderings.

      --
      "The problem with socialism is eventually you run out of other people's money" - Thatcher.
    7. Re:I think I speak for many of us by Anonymous Coward · · Score: 0

      It's funny, but I thought this was supposed to be a site for geeks; aren't geeks supposed to *like* newer, better toys?

      It's raw coolness vs. problem solving.

      You don't have to be as clever with newer stuff to get stuff done, and being clever is a big part of nerdliness.

    8. Re:I think I speak for many of us by trashme · · Score: 1

      Graphics can be used to enhance the gameplay. A lot of first person shooters now make you rely on stealth and cunning instead of running into a room with guns blazing. With more detailed graphics you can get subtle hints of an enemies whereabouts from the environment. Casting shadows. Diturbing branches or bushes, or other parts of their environment.

      Subtle cues like these were not possible years ago. The graphic detail just was not available in real time.

    9. Re:I think I speak for many of us by dasmegabyte · · Score: 1

      Geeks may like toys, but some of us don't see the point in touting every minor advance. Yes, technology got better, but that's what it does. I've got no problem with you and your pixel shader, but *I* don't need it.

      Of course, I think a lot of people are worried that those who right GOOD games are going to do so requiring the new hardware. Which would force those of us who wanted to play, but did not have the new hardware, to go out and essentially waste money for one game.

      I did this recently for KOTOR.

      It is nice to see somebody's thinking about what else to use this graphics card for BESIDES graphics.

      --
      Hey freaks: now you're ju
    10. Re:I think I speak for many of us by LordMyren · · Score: 1

      geeks like better newer, we just have different definitions from marketing as to what qualifies for better.

  64. Let me check my notes... by Impeesa · · Score: 4, Interesting

    I did a paper on the topic of general-purpose GPU programming for my parallel computing course just this last semester here, interestingly enough. I believe our research indicated that even a single PCI card was so badly throttled by the bus throughput that it was basically useless. AGP does a lot better taking data in, but it's still pretty costly sending data back to the CPU. I have a feeling your proposed setup will be a whole lot more feasible if/when PCI Express becomes mainstream.

    1. Re:Let me check my notes... by sonamchauhan · · Score: 2, Informative

      Seems worth checking out: GPGPU.ORG - "General-Purpose Computation Using Graphics Hardware"

      > AGP does a lot better taking data in, but it's still pretty
      > costly sending data back to the CPU.
      I've heard that mentioned a few times, is it true?

      From the AGP 3.0 spec:
      The AGP3.0 interface is designed to support several platform generations based upon 0.25m (and
      smaller) component silicon technology, spanning several technology generations. As with AGP2.0, the
      physical interface is designed to operate at a common clock frequency of 66 MHz. Its source
      synchronous data strobe operation, however, is octal-clocked and transfers eight double words
      (Dwords) of data within the span of time consumed by a single common clock cycle. The AGP3.0 data
      bus provides a peak theoretical bandwidth of 2.1 GB/s (32 bits per transfer at 533 MT/s). Both the
      common clock and source synchronous data strobe operation and protocols are similar to those
      employed by AGP2.0.11


      Later on Page 96:
      Traditional AGP devices can demand up to the maximum bandwidth available over the AGP ports.
      However, the AGP system does not guarantee to deliver the requested bandwidth, nor does it guarantee
      transfers will take place within some clearly specified request/transfer latency time. ...
      This is done by the system guaranteeing to process a specified number (N) of read or write transactions of a specified size (Y) during each isochronous time period (T). An AGP3.0 device can divide this bandwidth between read and write traffic as appropriate. Further, the system transfers isochronous data over the AGP3.0 Port within a specified latency (L).

      (emphasis mine)

      I'm no expert, just asking if the "low upsream bandwidth" assumption is true. If it is, there could still some applications (eg: simple data compression) that could use it. Also, maybe output from VGA/DVI ports could be tapped.

    2. Re:Let me check my notes... by Hast · · Score: 1

      I read about test someone posted on Slashdot perhaps half a year ago about this. His conclusion was that there were significant differences in speed between DirectX and OpenGL. IIRC DirectX could put data to the card fast, but had very slow read back. OpenGL had slightly lower to card speed but the read-back was almost identical.

      I haven't been able to find it since then however.

    3. Re:Let me check my notes... by Jah-Wren+Ryel · · Score: 1

      From what I have read, most video cards (or at least the drivers for said cards) do not implement the bidirectional functionality of the AGP 3.0 spec.

      Bidirectionality is a feature often touted of the new PCI-Express hosted GPUs, as if even AGP 3.0 was still only one way. If they would just do AGP 3.0 right it wouldn't be such a big deal, but with AGP now officially an obsolete bus, chances are no one will spend the engineering resources necessary to make full AGP 3.0 functionality available. Onward and Upward! (or at least Onward and Sideways!)

      --
      When information is power, privacy is freedom.
    4. Re:Let me check my notes... by sonamchauhan · · Score: 3, Interesting

      Somewhere in this story, I found a post with a a link that explains this is a software problem:
      Notice that they're quick to point out the problem isn't likely a hardware issue. There should be plenty of bandwidth on the AGP bus, but graphics chip makers don't seem to have written their drivers to handle transfers from AGP cards to main memory properly.

      Then they run some tests and conclude:
      That means even if you can render high-quality images at 30 frames per second, you won't be able to get them out of the graphics card at anything near that rate.

  65. Alternative use by Zog+The+Undeniable · · Score: 2, Interesting

    Remember the story about PS2's being used in Iraqi WMDs? No doubt the next "outlaw state" will be accused of using GeForce Ti4600's to manage fast breeder reactors.

    --
    When I am king, you will be first against the wall.
    1. Re:Alternative use by tiger99 · · Score: 1
      You don't need much more than analogue hardware or maybe a Z80 processor, with lots of backups, preferabbly totally dissimilar, to manage fast breeder reactors. Reliability and redundancy are the key issues. You don't want anything that can have a BSOD of course.... The thing you probably are confusing it with is simulating weapons to get the design right, before you build one, that really does need a vast array of processors, maybe something like a Beowulf would do. Such simulation is probably also used for modern conventional weapons, to assist the design process, in the same way that it is now used in simulating crash testing of cars, and many other complex 3-D problems.

      There again, it has been said that no new design of atomic bomb ever failed at the first attempt, and that was mostly before the computing era, so I think that what it avoids is countless implosion tests to get the high explosive bit correct. I think that some H-bombs may have been a disappointment, they are apparently many orders of magnitude more difficult. Much info is published on web sites.... I don't think that it should be, or in books for that matter, but the stable door can't be closed now, the horse was allowed to bolt many years ago.

  66. Patent The Idea Now! by KageMonkey · · Score: 0

    1) Patent the idea of using spare GPU cycles to do non-graphic related computational work. 2) ????? 3) Licensing the idea to ATi and nVidia 4) Profit!

  67. Re:Maybe time for a new generation of math-process by pe1chl · · Score: 4, Insightful

    What I remember about co-processing cards and "intelligent peripheral cards" (like raid controllers or network cards with an onboard processor) is this:

    There is a certain overhead because a communications protocol is to be established between the main processor and the co-processor. For simple tasks the main processor often stops and waits for the co-processor to complete the task and retrieves the results. For more complicated tasks, the main processor continues but later an interrupt occurs that the main processor must service.

    You must be very careful or the extra overhead of this communication makes the execution of the task slower than without the co-processor. This is certainly going to happen at some time in the future, when you increase central processor power all the time but keep using the same co-processor.

    For example, your matrix co-processor needs to be fed the matrix data, start working, and tell it is finished. Your performance would not only be limited by the processor speed, but also by the bus transfer rate, and by the impact those fast bus transfers have on the CPU-memory bandwidth available and the on-CPU cache validity.
    When you are unlucky, the next CPU you buy is faster in performing the task itself.

  68. Dual Core by BrookHarty · · Score: 4, Interesting

    With Dual Core CPU's going to be the norm, why not a Dual Core GPU for even faster gfx cards? With everyone wanting 16x antialiasing at 1600x1200 to get over 100fps, its gonna take some very powerful GPU's (or some dual cores).

    Even with the ATI 800XT, 1600x1200 can dip below 30FPS with AA/AF on higher settings. Still a ways to go for that full virtual reality look.

    1. Re:Dual Core by PhrostyMcByte · · Score: 2, Informative

      Video cards are already able to run many things in parallel- they are beyond dual-core.

    2. Re:Dual Core by BrookHarty · · Score: 2, Interesting

      Video cards are already able to run many things in parallel- they are beyond dual-core.

      There where dual ATI GPU's or Matrox or even the old Voodoo2 SLI. Seems you can increase speed with more cores.

    3. Re:Dual Core by tiger99 · · Score: 1

      I can understand 1600*1200, in fact I am using it now, to get a lot of info on the screen, but over 100fps provides no tangible benefit except to the graphics card manufacturers. There is no point in paying to display what the eye can't distinguish.

    4. Re:Dual Core by wwahammy · · Score: 1

      Check out XGI. They're an upstart company trying to compete with Nvidia and ATI. They're using a dual core GPU for their fastest cards. Granted they're doing that to catch up with the big two but in the future it could still be used with more powerful GPUs.

    5. Re:Dual Core by BrookHarty · · Score: 2, Informative

      I can tell upto about 80'ish FPS, but I run the refresh rate at 85 or 100 for no flicker. So yes there is a point for higher FPS. But you didnt say you played video games. And if you turn vsync off you get tearing.

      I remember awhile back someone did quake2 benchmarks on accuracy vs FPS, and how 79FPS (i think) was the sweet spot, faster and lower refresh rate had a negative effect on accuracy.

      But I wont argue 20FPS over 80, but 100 seems to be target. imho

    6. Re:Dual Core by Anonymous Coward · · Score: 0

      You already have dual core VPU's, check www.3dlabs.com for the Wildcat Realizm papers.

    7. Re:Dual Core by brokenbeaker · · Score: 1

      Yes there were multi-chip cards. You don't see them anymore, do you? the Vodoo did line-level parallelism, ATI did screen level. We now have much finer level parallelism within the GPUs, with their parallel texture pipelines etc. Presumably this is more efficient, i would guess due to getting data on/off the chip.

    8. Re:Dual Core by renoX · · Score: 1

      Except of course that what is 100fps in a game, will be much lower in a newer game, so you'll be happy to have a powerfull videocard.

      Do you play Flight Simulators?
      Currently nobody is able to play Forgotten Battle at 1600*1200 with AA at perfect settings (well you can play if you want to see a slide show), and I won't even talk about Lock On which is even more hungry for power I've heard (both CPU and videocard).

  69. Audio DSP by buserror · · Score: 4, Informative

    I've been thinking about using the GPU for audio DSP work for some time, even got to a point where I could transform some signal by "rendering" it into a texture (in a simple way, I could mix two sounds using the alpha as factor).
    The problem is that these cards are made to be "write only" and that basicaly fetching back anything from them is *very* slow, which makes them totaly useless for the purpose, since you *kmow* the results are there, but you can't fetch them in an usefull/fast maneer.
    I wonder if it's deliberate, to sell the "pro" cards they use for the rendering farms

    1. Re:Audio DSP by Evan+Meakyl · · Score: 1

      I had the same idea a few months ago, and faced the same problem. however we can hope that the future "PCI express" will help to reduce this fetching time, as it seems - however, the price of this kind of cards will surely be at first time very expensive... I prefer investing into a good CPU :)

    2. Re:Audio DSP by SmackCrackandPot · · Score: 3, Insightful

      I wonder if it's deliberate, to sell the "pro" cards they use for the rendering farms

      No, it's just the way that the OpenGL and DirectX API's evolved. There never was any need in the past to have a substantial data feedback. The only need back then was to read pixelmaps and selection tags for determining when an object had been picked.

    3. Re:Audio DSP by attaka · · Score: 2, Funny

      Easy!
      You just have to figure out how to connect that toslink cable to the digital monitor connector.

    4. Re:Audio DSP by dustman · · Score: 1

      No, it's just the way that the OpenGL and DirectX API's evolved.

      Wrong again... The root cause is that there has never been enough of a need for high-speed transfer from the video card to system memory.

      Both DirectX and OpenGL provide means for doing it, and if the underlying hardware supported it, an implementation could be made that transferred output data (say, a screen which has finished rendering) to the system just as fast as input data (texture uploads) to the video card.

      With the newer PCI standard coming out, the underlying hardware will support this sort of transfer, and the video card manufacturers will start to add support for it.

  70. Swings and Roundabouts by turgid · · Score: 1
    In the days of yore, CPUs were too slow to do the kind of 3D that gamers wanted, so GPUs were developed (actually they were developed 10+ years earlier by the likes of SGI(RIP) for CAD and scientific visualisation).

    Then there is a quantum change in CPU design, and CPUs catch up with GPUs (at least on paper). The someone finds out something new and cool they can do by pushing these CPUs to the limit, but only just. Then, the scientists and engineers decide it's useful and someone makes expensive dedicated hardware. Then the games find out about it and buy less expensive but $1000 hardware. Them there is a period of frenzied competition and it comes down to $100.

    Then, someone goes and thinks of something new and cool to do with this cheap high-powered hardware....

    1. Re:Swings and Roundabouts by Gumber · · Score: 1

      SGIs early version of the Reality Engine used multiple units of the intel i860, a RISC chip and one of intels many attempts to move beyond the x86, to do the serious crunching. If one reads the paper in IEEE Micro when the i860 was released, one can see that it was positioned as a general purpose CPU for engineering workstations. I don't think it was used in said capacity in anything other than a reference design.

  71. This is wonderful.. by adeyadey · · Score: 1

    Could this mean that we could evolve a "Back-door" way to dump the disgusting x86 achitecture? Think about it - we devise a universal OS way in both Linux/Windows of allocating tasks/threads to "external" RISC processors. At some stage, these can be the "main" processors, able to run the host/boot-up/old code under emulation. Then, dump the 386!

    Think about it..

    --
    "You lied to me! There is a Swansea!"
    1. Re:This is wonderful.. by Anonymous Coward · · Score: 0

      x86 isn't all bad anymore, things are shaping up with x86-64; 16 general purpose registers and the vector unit (sse3) takes care of the largest issues.
      Sure, the instruction-set encodings are kindof kinky, but on the other hand it may also be more space efficient (cache friendly) than a fixed-length-instruction native risc encoding. (ok - don't really believe that myself, but.. ;))

    2. Re:This is wonderful.. by adeyadey · · Score: 1

      Trouble is I hear that pretty much every time theres a new Pentium - ie, the internal de-x86-ifier is sooo clever it runs x86 code as well as if it were coded for a RISC processor. But it never holds out in real life..

      Lets all go back to asm..

      --
      "You lied to me! There is a Swansea!"
  72. Re:Link to previous discussion on same/similar sub by Crazy+Eight · · Score: 5, Informative

    QE is cool, but it doesn't do anything similar at all to what they're talking about here. FFTs on an NV30 are only incidentally related to texture mapping window contents. Check out gpgpu.org or BrookGPU. In a sense, the idea is to treat modern graphics hardware as the next step beyond SIMD instruction sets. Incidentally, e17 exploited (hardware) GL rendering of 2D graphics via evas a bit before Apple put that into OS X.

  73. Commodore 64 by curator_thew · · Score: 5, Interesting


    This concept was being used back in 1988. The Commodore 64 (1mhz 6510, a 6502 like micro processor) had a peripheral 5.25 disk drive called the 1541, which itself had a 1mhz 6510 cpu in it, connected via. a serial link.

    It became common practice to introduce fast loaders: these were partially resident in the C64, and also in the 1541: effectively replacing the 1541's limited firmware.

    However, demo programmers figured out how to utilise the 1541: one particular demo involved uploading program to the 1541 at start, then upon ever screen rewrite, uploading vectors to the 1541, which the 1541 would perform calculations in parallel with the C64, then at the end of the screen, the C64 fetch the results from the 1541, and incorporate them into the next screen frame.

    Equally, GPU provides similar capability if so used.

    1. Re:Commodore 64 by kunudo · · Score: 1

      Brilliant, except, that's what GPU's were made for in the first place.

      Of course, you were probably speaking in more general terms.

    2. Re:Commodore 64 by sploxx · · Score: 1

      Ahh, the C64 times..

      It was even better, the ratio was 1:2 (the 1541 hat a 2MHz processor!)

    3. Re:Commodore 64 by Temkin · · Score: 1



      1988? Try 1984.

    4. Re:Commodore 64 by pommiekiwifruit · · Score: 2, Insightful

      I would be interested in a reference for that, since the 1541 serial link was so slow. If you are talking about Mindsmear that was not actually released, but a demo would have to be pretty clever to make the communication time worth while (and accurate with the screen still turned on).

    5. Re:Commodore 64 by curator_thew · · Score: 3, Informative

      I don't recall exactly: maybe Horizon, definitely scandinavian. I remember because I decompiled it! What happened was that I started the demo, and unusually the disk drive kept spinning: so I turned if off which caused the demo to fail. Tested loading, then trying to start the demo and it didn't work, so curiosity, an Action Reply and an irq investigation revealed what was going on. I think it was a single part demo: the most memorable C64 demo for me because of that trick.

    6. Re:Commodore 64 by Anonymous Coward · · Score: 1, Informative

      This concept was being used back in 1988. The Commodore 64 (1mhz 6510, a 6502 like micro processor) had a peripheral 5.25 disk drive called the 1541, which itself had a 1mhz 6510 cpu in it, connected via. a serial link.

      Actually, the processor in the 1541 was an ordinary 6502. The 6510 added some memory mapping stuff that the drive didn't need.

  74. Expand this thinking! by Osty · · Score: 3, Interesting

    You're absolutely correct that these "game snobs" are looking at the past through rose-colored graphics, forgetting all of the stinkers of yesteryear. However, it's not just games where this applies. How many times have you heard people complain about how bad movies are now, or music, or books? It's exactly the same phenomenon. When your grandfather tells you how much better things were "back in the day", it's for exactly the same reason. He's looking back at all the good things, while ignoring all of the bad.


    Face it, everything mostly sucks. It always has, and it always will. There will always be some gems that really stand out, and those will be what are remembered when people fondly look back on "the old days". Get over it.

    1. Re:Expand this thinking! by brokenbeaker · · Score: 1

      You must admit, though, that in your grandfather's days, things were quite different. Not so much free info, you could count on having a job for a long time, cold war, more overt racism, whatever. It might be selective memory that makes him prefer the past, but you can't just say, "things have always been the same, always sucky". Things definitely change, and we have to decide if we like how they have changed, and act accordingly.

    2. Re:Expand this thinking! by Anonymous Coward · · Score: 0

      The difference is that people can remember a "Golden Age" of video games: when machines where new and current style of gaming where invented.

      It was a "Golden Age" not because today's gaming industry sucks but because since nothing pre-existed, all was new.

      Nowadays, we see less disrupting new ways of gaming only because much have allready been done.

  75. Good one! by Pan+T.+Hose · · Score: 1

    Now I finally understand that acronym: General purpose unit!

    Please mod parent up: +5, Funny!

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  76. That benchmark is bullshit by interJ · · Score: 1

    They say the GPU outperforms the CPU, but they do not test it fairly: They did not use SIMD instruction sets such as SSE for the CPU code, but coded it in "straight C++". If they had, the CPU code would probably have run faster than the GPU version (at least for the matrix multiplication.)

    1. Re:That benchmark is bullshit by Shinobi · · Score: 1

      SSE/SSE2/SSE3, MMX, 3DNOW or Altivec is not even comparable, since they are so limited in how they implement vector math. A GPU is basically a multi-pipeline vector processor.

      Much of what you see in 3D games are matrix multiplications. Try telling me that a game such as Far Cry would run faster rendered by the CPU, even with SSE2 or so involved, than on a GPU.

    2. Re:That benchmark is bullshit by Anonymous Coward · · Score: 0

      Doubly bullshit, in that they also don't bother to use blocking to improve the cache performance. The only fair way to compare these things is to optimize both implementations as much as possible. The CPU is more flexible than the GPU, and an unbiased test must take this into account. For the 1500*1500 matrix product benchmark, on my laptop (2GHz P4, compared to the 1.5GHz P4 they used), it's 13 seconds using a reasonable blocked algorithm. This compares to 20+ seconds for their GPU algorithm and about 90 for their CPU version. The only lesson here is that if you compare to something very stupid, you can demonstrate wonderful speedups.

      The situation with their other "realistic" benchmark, 3-SAT, is even worse, in that their algorithm is so naive as to be laughable. With NP-hard problems, any fixed speedup is basically useless against the exponential complexity. The only way to make progress is through more clever algorithms, which is exactly where the GPU falls on its face.

    3. Re:That benchmark is bullshit by PipsqueakOnAP133 · · Score: 1

      Highly doubtful. Throwing better hardware at the problem (GPUs) versus using what can be reduced to a tiny coprocessor add-on inside the CPU (SSE*) is pretty much a no-contest scenerio.

      In some of the results, the GPU was found to be over 16X faster. There's no way that SSE would have helped since the parallelism in the GPU is way beyond what you can get out of SSE. Sure, it might have made it closer, but having the CPU lose by a 4X multiplier still grants their study credit as they wanted to see that GPUs can do stuff better in certain cases.

  77. Re:Link to previous discussion on same/similar sub by EvilNTUser · · Score: 1

    "Microsoft, as usual, announced the feature after Apple shipped it"

    God I'm tired of hearing that phrase over and over again when 95% of the time it's just because Apple can control the hardware and it would be a total disaster if MS included a technology as fast as they do...

    --
    My Sig: SEGV
  78. Re:Maybe time for a new generation of math-process by Squant · · Score: 2, Insightful

    Math co processor boards would be great, buy still quite fixed function.

    It would be much more efficient if you would implement an co processor with an FPGA. First programming the FPGA what functions to execute. And then feeding the data to it, when the calculation is completed you just reprogram it to become whatever you want.

    This way you would not have an math only board, but a board that could perform many many functions. You just need to write algorithms to exploit them.

  79. So? How do you program this? by Anonymous Coward · · Score: 1, Interesting

    Are there interrupts? Available to userland?

    The 'acceleration' layer (DirectX, Xv?) is not even available to the programmers. The programmer requests from DirectX or SDL to draw a polygon. Then DirectX or SDL invoke acceleration features of the card. But we do not have direct access to those features. They are not even documented.

    Will the kernel provide those facilities?
    Because it would be stupid to go through SDL to perform FFT with the video card's capabilities.

  80. Re:Link to previous discussion on same/similar sub by cehardin · · Score: 3, Insightful

    I think the real reason Apple comes out with newer and bette technology is because they have to fight for their user base. After all, if Apple's products were the same as Microsoft's, who would care?

    Microsoft can afford to be lazy with their products, they make money either way. I don't think that will last forever though. Sometimes they do try hard, NT for example, but then they pile a bunch of poorly designed stuff to go on top of it and that ruins it. If you can, check out OS X's directory structure, it's beautiful. Now compare that to Window's cryptic system...

    "Microsoft, as usual, announced the feature after Apple shipped it"

    "God I'm tired of hearing that phrase over and over again when 95% of the time it's just because Apple can control the hardware and it would be a total disaster if MS included a technology as fast as they do..."

  81. Re:Link to previous discussion on same/similar sub by Anonymous Coward · · Score: 0
    God I'm tired of hearing that phrase over and over again when 95% of the time it's just because Apple can control the hardware and it would be a total disaster if MS included a technology as fast as they do...


    I didn't say why it happens, I just said that it happens (ie MS announces the product after Apple). The original comment stands.
  82. Multiple PCI-X cards by HardeH · · Score: 1

    Wouldn't it be a nice experiment to use multiple next-generation PCI-X "graphic" cards on one system as the ultimate set of matrix co-processors? You could create your mini-cluster inside one box! Anyone seen or tried this yet?

  83. All very impressive, but.... by tiger99 · · Score: 3, Insightful
    ... there are a few snags, such as the fact that a GPU will not have (because it normally does not need) memory management and protection, so it is really only safe to run one task at a time. And, does this not need the knowledge of the architecture and instruction set that Nvidia seem to be unable or unwilling to disclose, hence the continuing controversy over the binary-only Linux drivers?

    However I do know that a lot of people had been wondering about this for a while, could it be done, and was it worth attempting, so now we know. Maybe we shall soon see PCI cards containing an array of GPUs, I imagine the cooling arrangements will be quite interesting!

    There are other things which are faster than a typical CPU, are not some of the processors in games machines 128-bit? Again, you could in theory put some of these together as a co-processor of some sort.

    This was a good piece of work technically, but it says something about society that the fastest mass-produced processors, whether for GPUs or games consoles, exist because people want a higher frame rate in Quake. I can't think of any professional application that needs really fast graphics output, but many that could use faster processing. So why can't Intel and AMD stop putting everything in the one CPU (multiple CPUs with one memory are not really much better), and make co-processors again, which will do fast matrix operations on very large arrays, etc, for those who need them? The ultimate horror of the one CPU philosophy was the winmodem and winprinter, both ridiculous. Silicon is in fact quite cheap, as Nvidia have proved, people's time while they wait for long calculations to finish is not.

    Maybe we are going to see an architectural change coming, I expect it will be supported by FOSS long before Longhorn, just like the AMD64.

    1. Re:All very impressive, but.... by norkakn · · Score: 1

      Coprossers don't really work with today's processors because it takes far too long to communicate between the different part. When you here about "units" in todays processors, those are like coprossers. The clearest example of this is the vector processing unit, usually a 128 bit beast that 10 years ago could be a seperate chip, but now it only really works practically if they share the same chunk of silicon

  84. It's already here by mcbridematt · · Score: 1

    Remember NVIDIA's Gelato, which was released a few weeks ago?

    Gelato uses the GPU as a floating point processor, in addition to the CPU. I would still love to see a movie rendered in realtime with OpenGL, though.

    1. Re:It's already here by Trejkaz · · Score: 1

      Well I dunno about OpenGL but Direct3D has been done before, via acting in a game engine with voiceovers.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
  85. what's really needed by curator_thew · · Score: 3, Interesting


    What's really needed is to couple the GPU and CPU in such a way that the GPU actually runs a very low level O/S, like an L4Ka style kernel (http://l4ka.org/), and becomes "just another" MP resource.

    Then, on top of this low level, actually runs the UI graphics driver and so on. Other tasks can also run, but ultimately the priority is given to the UI driver.

    Then, the O/S on the CPU needs to be able to know generally how to distribute tasks across to the GPU. Fairly standard for a tightly coupled MP that has shared bus memory.

    Why do I say this? Because the result is

    (a) if you're using an especially high performance application, the GUI runs full throttle dedicated to rendering/etc and acts as per normal;

    (b) if you're not, e.g. such as when running Office or Engineering other compute intensive tasks (e.g. recoding video without displaying the video), then the GPU is just another multi processor resource to soak up cycles.

    Then, CPU/GPU is just a seamless computing resource. The fantastic benefit of this is that if the O/S is designed properly, then it could allow simply buying/plugging in additional PCI (well, PCI probably not good because of low speed, perhaps AGP?) cards that are simply "additonal processors" - then you get a relatively cheaper way of putting more MP into your machine.

  86. Stagnant CPU architectures by Anonymous Coward · · Score: 0

    I have programmed in assembly for 20 years, also done some assembly programming of DSPs and some chip design. What strikes me with this background is the strange stagnant state of CPU architecture development when you compare with other related fields, where GPUs is the latest addition.

    Just look at the newsgroup news:comp.arch which allegedly is dedicated to computer architecture but in reality is about computer archaeology, discussing computers of yesteryear.

    For instance, most DSPs include data pumps and zero overhead loops that would be useful in general purpose CPUs. You also have MACs (admittedly more specialised), max/min functions (often useful) and dataflow features (would definitely be nice).

    Newer architectures feature (sea of) sub processors, multiple busses and pathways. Yet CPUs steadfastly stand still, MMX, SSE1/2/3 notwithstanding.

    Many DSPs on sound cards are reprogrammable, why distributed computing projects like SETI and the like have not taken advantage of these is beyond me. SETI does not even use multithreading features of recent x86 CPUs.

    Now the thrust seems to be in GPUs and from what I can see they are taking up more and more features and reconfugurability and reprogrammability. All this is very exciting and relevant if the next generation personal VR systems are to become a reality. VRML was way too slow to be of practical use and still did not have body- or head tracking or even eye tracking. All this requires a lot of 3D transformation to work seemlessly. Perhaps within a few years we could have semi-VR like the one shown in Minority Report.

    I agree with many other posts here that massive computational powers is wasted on fancier textures in games of yesteryear. The killer application for more power is, I believe, in personal (mobile) virtual reality.

    1. Re:Stagnant CPU architectures by Anonymous Coward · · Score: 0
      I see graphics cards frequently use multiple memory busses so I guess this should lend itself well to super Harvard architecture, like the proposed 4stack processor.

      BTW did you know that one Amiga design involved a DSP as a blitter? I saw the designs and it looked very, very nice. You might have been able to run a modem from the graphics subsystem, what a hack!

    2. Re:Stagnant CPU architectures by Anonymous Coward · · Score: 0

      >Many DSPs on sound cards are reprogrammable

      You know, Motorola might quite possible have cornered this market with their DSP563xx series, the fast, big brother of the DSP560xx series used in the NeXT box.

      The chip is nice and orthogonal as Motorola products often seem to be. Also you had on-chip RAM, high speed serial and parallel busses, timers and more. This would really have been a nice addition to the motherboard standard, and with the built in PCI interface it would have been efficient too. It would have provided lot of high speed IO, modem fuctionality, even synchronous ports and easy reconfigurability and real-time functions, being a separate processor.

      But in the end the fatal hardware bugs in, yup, the PCI interface killed off that prospect.

  87. Re:Maybe time for a new generation of math-process by Temkin · · Score: 2, Interesting

    Remember the co-processors? Well, actually I don't (I'm a tad to young). But I know about them.



    Dig deeper. 8087 FPU's were nice, though they ran hot enough to cook on, but the idea had existed for 15 or more years before they appeared. Try looking into the old DEC PDP-11 archives. There you'll find DEC's own "CIS" or "commercial instruction set", which was a set of boards (later a add on chip) that added string, character and BCD math instructions. DEC also had a FPU card set that implemented a 64-bit FPU out of AMD 2901 bit slice processors. Many low-budget not-quite-supercomputers were really add-on hardware boxes to a general purpose computers. Basicly add-on stunt boxes.


    Dam... I'm too young to feel this old! Most of this stuff was in play when I was in grade school.


    Temkin

  88. I'd say to you: Why? by Anonymous Coward · · Score: 0

    But I learned not all PDFs open in Windows, like this -- click on the link "Clique aqui para obter o arquivo", which means "Click here to get the file".

    That opens normally in Konqueror, but didn't open at work in IE, under Windows 2000 through a proxy, either with Adobe 5 or 6 (well, it opened once with W2K+IE+Adobe 6, on a Xeon 2.8GHz -- probably due to some OS or browser configuration).

    It is still possible to save to disk on W2K and read with Adobe 5 from the disk file, though. But it shows something is making PDFs not so universally readable as one might previously think.

  89. compiler? by Anonymous Coward · · Score: 1, Interesting

    Could this be integrated with a compiler, so that the compiler could elect to use the GPU? That would be really cool:

    gcc --with-gpu somebigprog.c

  90. Sad. :-( by Anonymous Coward · · Score: 0

    People in /. can't even understand your joke.

    I knew most don't have a life, but now it seems many can't remember what having a life looks like.

  91. Because he did the right thing. by Anonymous Coward · · Score: 0

    Citing sources is of paramount importance in scientific discussions. This is not "I think I saw somewhere" stuff...

    This guy is obviously a pro, or a clever dude, if he's still a kid. And he's got a 4. Deserved, IMHO.

    1. Re:Because he did the right thing. by Anonymous Coward · · Score: 0

      someone else has said it and i say it again:

      Citing sources is of paramount importance in scientific discussions. This is not "I think I saw somewhere" stuff...

      This guy is obviously a pro, or a clever dude, if he's still a kid. And he's got a 4. Deserved, IMHO.

  92. because its shit by Anonymous Coward · · Score: 0

    fuck Adobe and fuck their shitty bloated PDF, on Windows i can choose Adobe Acrobat Reader or command line xpdf crap and thats it ! so much for open formats egh ?
    thanks but id rather not read a PDF than use Adobes shite reader and its not 1980 so im not using command line, send me a jpg or text file or if you want web people to read it (this is the internet) HTML

  93. Re:transistor counts through the ages (PPC) by Malic · · Score: 1

    According to http://www.cs.swan.ac.uk/~csneal/HPM/alpha.html...

    G4 - 33 million
    G5 - 52 million

    And from IBM...

    G3 (750cx) - 22 million

    --
    I swear by MacOS X. Although I use to swear *at* MacOS 9...
  94. Re:Maybe time for a new generation of math-process by Anonymous Coward · · Score: 0

    I'd imagine it goes in cycles as communication technologies change - for 1985-1995, the multi-custom-chip amiga design ran rings around the PC, then the PC caught up by brute force. Now may be the time for PCs to go multi-custom-chip for a while.

  95. Bliss by mcgroarty · · Score: 1

    Dual heads = SETI for TeamSlashdot *and* TeamARS!

  96. Re:Link to previous discussion on same/similar sub by Xabraxas · · Score: 1
    In fact, one of their chairmen even claims he has invented the internet.

    I know that was supposed to be a joke but please stop spreading lies. It makes you look unintelligent, and that joke hasn't been funny in years anyway.

    --
    Time makes more converts than reason
  97. OpenGL already used for movies by CoolQ · · Score: 1

    I've already rendered movies in OpenGL...

    mplayer -vo gl dvd://1

    :P
    --Quentin

  98. Very bad article by Slash.ter · · Score: 3, Interesting
    This is a very poor quality article, I analyzed it before. There are possibly better ones mentioned by others.

    Just look at the matrix multiplication case. Look at the graph and see that 1000x1000 takes 30 seconds on CPU and 7 seconds on GPU. Let's translate it to Millions of operations per second: CPU -> 33 Mop/s, GPU -> 142 Mop/s Matrix multiplication has cubic complexity so for CPU: 1000 * 1000 * 1000 / 7 seconds / 1000000 = 33 Mop/s

    Now think a while: 33 million operations on 1.5 GHz Pentium 4 with SSE (I assume there is no SSE2). Pentium 4 has fuse multiply-add unit which makes it do two ops per clock. So we get 3 billion ops per second peak performance! What they claim is that the CPU is 100 times slower for matrix multiply. That is unlikely. You can get 2/3 of peak on Pentium 4. Just look at ATLAS or FLAME projects. If you use one of these projects you can multiply 1000 matrix in half a second: 14 times faster than the quoted GPU.

    Another thing is the floating point arithmetic. GPU uses 32-bit numbers (at most). This is too small for most scientific codes. CPU can do 64-bits. Also, if you use 32-bits on CPU it will be 4 times as fast as 64-bit (SSE extension). So in 32-bit mode, Pentium 4 is 28 times faster than the quoted GPU.

    Finally, the length of the program. The reason matrix multiply was chosen is becuase it can be encoded in very short code - three simple loops. This fits well with 128-instruction vertex code length. You don't have to keep reloading the code. For more challenging codes it will exceed allowed vertex code length. The three loop matrix multiply implementation stresses memory bandwidth. And CPU has MB/s and GPU has GB/s. No wonder GPU wins. But I can guess that without making any tests.

  99. Crypto is an interesting problem by thogard · · Score: 1

    Years ago AT&T made something called the pixel machine which was a bunch of DSPs in a box and in effect a cluster of GPUs. There are rumors that the spooks used them for DES cracking.

  100. Ever heard of PCI Express? by Egekrusher2K · · Score: 2, Insightful

    Touche. However, with the upcoming advances in bus speeds (read: PCI Express) and the available bandwidth to the PCI bus, we won't have to worry about latency when using a coprocessor type piece of hardware. There is room to grow with this new bus to almost outlandish amounts of bandwidth. Not a problem we'll run into any time soon.

    --
    Listen to my experimental-industrial-techno!
    1. Re:Ever heard of PCI Express? by Anonymous Coward · · Score: 0

      bandwidth != latency

  101. Using GPUs for cryptography by mrogers · · Score: 1

    Here's a paper from Columbia University on using GPUs to accelerate cryptographic calculations.

  102. SMP by MicroBerto · · Score: 1
    Here's a question - if I have a really kickass video card, but never play games, run at the highest resolution, or do anything fancy, are cycles being wasted?

    Could those cycles be donated to the main CPU in some sort of SMP scheme?

    --
    Berto
  103. Re:Maybe time for a new generation of math-process by Anonymous Coward · · Score: 0

    pci express will certainly help speed up transfers in that kind of system too

  104. NVidia is already doing this (sort of...) by CRCates · · Score: 1

    Nvidia has already announced Gelato which uses the GPU to render regularly CPU intensive frames for video production. the link is here and the film industry apparently already uses this. I think that it requires the increase 2-way bus bandwidth that PCI-Express offers to be of any use but it's interesting nonetheless. I suspect that with PCI-Express MoBo's becoming more prevalent there will be a new market for arming PCs with non-function-specific (eg. not dedicated to graphic) co-processors that can assist with processing intensive tasks.

  105. old news by Anonymous Coward · · Score: 0

    This is known as the cycle of reincarnation.

  106. Three questions by pvera · · Score: 2, Interesting

    1. Is anyone except Apple trying to leverage the GPU for non-3D tasks? Apple has been doing Quartz Extreme for a while but I have not heard if anyone else is doing it.

    2. Has anyone tried something similar to what Quartz Extreme does but for non-graphical tasks?

    3. How come GPU makers are not trying to make a CPU by themselves?

    --
    Pedro
    ----
    The Insomniac Coder
    1. Re:Three questions by be-fan · · Score: 2, Informative

      1. Is anyone except Apple trying to leverage the GPU for non-3D tasks? Apple has been doing Quartz Extreme for a while but I have not heard if anyone else is doing it.
      Microsoft, for Longhorn, and freedesktop.org, for X11. Both go quite a bit beyond Quartz Extreme by using D3D/OpenGL for all drawing, not just compositing.

      3. How come GPU makers are not trying to make a CPU by themselves?
      GPUs are very different from CPUs. Graphics is almost infinitely parallizable, so you are really just limited by how many execution units you can stick on the CPU. Assuming enough memory bandwidth, you get nearly a linear increase with increasing numbers of execution units. CPUs, on the other hand, deal with general-purpose code that has an inherent parallelism of about 3-way to 4-way at most. So CPU manufacturers have to do clever things like SMT to take advantage of increased execution resources, but mainly must concentrate on ramping up clock speed and memory bandwidth.

      Interestingly enough, GPU makers wouldn't be very good at making CPUs. GPUs are designed using high-level software, like VHDL. This has a big impact on their maximum clock speed, but that doesn't really matter, because they can always double the number of pipelines and get a nearly 2x increase in performance. Meanwhile, CPUs are designed by hand, and tweeked to get every last MHz, because throwing twice as many execution units on the CPU wouldn't help performance much at all.

      --
      A deep unwavering belief is a sure sign you're missing something...
  107. Interesting work that raises some questions... by thurin_the_destroyer · · Score: 4, Informative

    Having done a similar work for my final year project this year, I have some experience attempting general purpose computation on a GPU. The results that I recieved when comparing the CPU with the GPU were very different with many of the applications coming in at 7-15 times slower on the GPU. Further, I discovered some problems which I mention below:

    ! Matrix results
    As in mentioned earlier in the report, the graphics pipeline does not support a branch instruction. So with a limitied number of assembly instructions that can be executed in each stage of the pipeline (either 128 or 256 in current cards), how is it possible for them to perform a calculation on a 1500x1500 matrix multiplication. To calculate a single result 1500 multiplications would need to take place and if they are really clever about how they encode the data into texture s to optimise access, they would need two texture accesses for even 4 multiplications. By my calculations that is 1875 instructions, where you can only do 128 or 256.

    My tests found that using the Cg compiler provided by NVidia, that a matrix of size 26x26 could be multiplied before the unrolling of the for loop exceed the 256 limitation.

    One aspect that my evaluation did not get to examine was the possiblity of reading partial results back from the framebuffer to the texture memory along with loading a slightly modified program to generate the next partial result. They don't mention if they used this strategy so I assume that they don't.

    ! Inclusion of a branch instruction
    Even if a branch instruction were to be included into the vertex and fragment stages of the pipeline, it would cause serious timing issues. As student of Computer Science, I have been taught that the pipeline operates at the speed of the slowest stage and from designing simple pipelined ALUs, I see the logic behind it. However, if a branch instruction is included then the fragment processing stage could become the slowest as the pipeline stalls waiting for the fragment processor to output its information into the framebuffer. I believe it for this reason that the GPU designers specifically did not include a branch instruction.

    ! Accuracy
    My work also found a serious accuracy issue with attempting compuation on the GPU. Firstly, the GPU hardware represents all number in the pipeline as floating point values. As many of you can probably guess, this brings up the ever present problem of 'floating point error'. The interface between GPU and CPU are traditionally 8-bit values. Once they are imported into the 32-bit floating point pipeline the representation has them falling between 0 and 1, meaning that these numbers must be scaled up to their intended representations (integers between 0 and 255 for example) before computation can begin. Combine these two necessary operations and what I saw was a serious accuracy issue where five of my nine results(in the 3x3 matrix) were one integer value out.

    While I don't claim to be an expert on these matters, I do think there is the possiblity of using commodity graphics cards for general purpose computation. However, using hardware that is not designed for this purpose holds some serious constraints in my opinion. Anyone who cares to look at my work can find it here

    1. Re:Interesting work that raises some questions... by be-fan · · Score: 1

      I believe it for this reason that the GPU designers specifically did not include a branch instruction.
      On NV40-based architectures, you get a branch instruction as well as a 65536 instruction program limit.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:Interesting work that raises some questions... by thurin_the_destroyer · · Score: 1

      > On NV40-based architectures, you get a branch instruction as well as a 65536 instruction program limit.

      Interesting. I would like to see how this effects the timing in the pipeline and if you would see a noticable slowdown when you make use of the full 512 instructions, executing 65k times.

      > A deep unwavering belief is a sure sign you're missing something...

      I don't know about `deep unwavering belief` but my work was based on a GeForce FX 5600 so all of my observations are based on that.

    3. Re:Interesting work that raises some questions... by HuguesT · · Score: 1

      > I don't know about `deep unwavering belief` but
      > my work was based on a GeForce FX 5600 so all of
      > my observations are based on that.

      Don't sweat the comment, it's just his sig.

  108. Re:Maybe time for a new generation of math-process by non · · Score: 1

    well if you want to use DSP chips for processing power, just look here.
    DSP PCI card 16 GFLOPS, example - 1 million point Arctan2 2.63 msec. i wonder if they have linux drivers?

    --
    ...vividly encapsulates that post-Watergate/pre-punk/coked-up moment when you could trust no one, least of all yourself.
  109. Bye Bye NVidia by Anonymous Coward · · Score: 0

    As GFX cards get closer and closer to CPUs, Intel are going to fight. Expect the next generation of Pentium to make GPUs obsolete.

  110. Re:Not the Point by feidaykin · · Score: 1

    And where have you been for the past five days? The paleoanthropology thing again? Inquiring stalkers want to know! ;)

    --

    "To confine our attention to terrestrial matters would be to limit the human spirit." -Stephen Hawking

  111. bottlenecks? by Ignatius_VI · · Score: 1

    Wouldn't the PCI/AGP architecture be a major bottleneck if the GPU was used as a general processor?

  112. Re:Link to previous discussion on same/similar sub by EvilNTUser · · Score: 1

    True. But even so, I don't think MS could've pulled it off before Longhorn, especially since it seems like a much more ambitious change than Quartz Extreme was.

    That said, there are a lot of other examples of people accusing Microsoft of "copying" completely obvious improvements from Apple. I thought we liked the concept of sharing ideas here at slashdot? :-)

    P.S. My desktop is an x86, but I also have a PowerBook, and yeah, OSX is good.

    --
    My Sig: SEGV
  113. What a waste of a read... by oxxen · · Score: 1

    I mean, cmon, if you're going to rehash old ideas, at least do your work thoroughly. There was another comment that mentioned the theoretical peak bandwidth of the P4 being quite a bit larger. That's issue one. Issue 2 is: the guy didn't even bother trying to recode in SSE2. Writing in C++ and compiling with optimizations doesn't get you there. You can get something like 2.5x when doing a matrix multiply in SSE land (depending on the version of the hardware you're using). Second, P4ee is irrelevant because he did his tests on a Wilamette! Note, too, that his 1500x1500 matrix is way too large for the CPU cache, so the whole process is memory bound, which shows you nothing about the computational power of the CPU. The GPU is always going to have the memory traffic, but it has a higher speed memory interface to it's internal texture memory, which implies that the GPU wins if you're just doing memory reads (though this is somewhat of a questionable arguement because the GPU still has to contend with the AGP bus). Next, the guy goes on to try to analyze the bottlenecks in GPU design when he apparently didn't even try to figure out why his own damn code ran so slow. Very weak, I say. GPUs are great and all, but they're not the be-all end-all of computation. You can do simple mathematical operations (as long as you don't care about high precision math), but you don't get things you need for crypto or video encoding, like bit operations and good branching (though they are moving toward better dynamic branching). So yah, you can do math on a graphics card...whoopdie doo. Good luck finding a company that will actually code for a GPU. Unless the instruction sets and CPU interfaces stabilize, nobody will want to code for it. Standardized platforms. There is code on Wintel platforms that was coded 8 years ago and still runs. Good luck finding that with a graphics app.

  114. New Slashdot slogan by Anonymous Coward · · Score: 0

    Slashdot: News for over 90% of the mass general public.

  115. This has brought light to the waste: by jago25_98 · · Score: 1

    There I am, console mode at the command line trying to compile the latest kde release.

    I might have a dedicated card:

    - for DSP audio
    - a dedicated AGP card for graphics
    - another PCI graphics card for xinerama
    - a PCI VPN card

    And yet only the CPU is being used (mostly). This is nuts ...but then without this division there wouldn't be sound and graphics card companies.

    Gimmie gimmie my GPU (glx) accelerated GCC!

  116. Zoë Wood == Geek Überbabe? by mosel-saar-ruwer · · Score: 1

    All of the material at that site credits one Zoë Wood as a co-authoress.

    images.google.com served up this:

    http://www.cs.caltech.edu/~zoe/zoe_grad1.jpg
    and this:
    http://www.csc.calpoly.edu/~zwood/zoe-poppy-sm.jpg
    Yowza!!! Somebody needs to introduce me to that chick.

  117. research done on this stuff by atcdevil · · Score: 0

    http://www.multires.caltech.edu/pubs/GPUSim.pdf
    c onjugate gradient and multi grid solver on the gpu

    http://wwwcg.in.tum.de/Research/data/Publication s/ sig03.pdf
    linear algebra operators on the gpu

  118. the magic of "streaming i/o" by peter303 · · Score: 3, Informative

    GPUs pass input and output from GPU memory at 4-12 bytes per flop. This is much faster than CPUs which are limited by bus speeds that are likely to deliver a number every sever several operations. So CPU benchmarks are bogus, using algorithms that use internal memory over and over again.

    Its not always easy to reformulate algorithms to fit streaming memory and other limitations of GPUs. This issue has come up in earlier generations of custom computers. So, there are things like cyclic matrices tha map multi-dimensional matrix operations into 1-D streams, and so on.

    The 2003 SIGGRAPH had a session on this topic showing you could implement a wide variety of algorithms outside of graphics.

    1. Re:the magic of "streaming i/o" by cjb110 · · Score: 1

      So CPU benchmarks are bogus, using algorithms that use internal memory over and over again.

      So you make sure your benchmarks work on data larger than the cpu's cache, if you want to test cpu and memory as a combined system.

      --
      ----- I refuse to have an argument with an unarmed person
  119. Folding@Home is actually working on this... by pointwood · · Score: 3, Interesting

    Some day you may be able to Fold proteins with your GPU.

  120. Binary Trees by Anonymous Coward · · Score: 0

    Actually, there is a decent trick that can be employed to increase performance when you are faced with computing "something branchy" such as a binary tree on hardware intended for massive matrix manipulation.

    Collapse *all* the possible branches into a connectivity matrix and reformulate the algorithm into a parallel one rather than a sequential one. This is pretty much the exact opposite of what Hillis did back in the early 1980s with his *LISP implementations of serial algorithms on massively parallel Connection Machine hardware.

    It isn't mindblowing, it just requires a little creativity and mental dexterity. I've used similar tricks myself.

  121. They are by Sycraft-fu · · Score: 1

    If you want to see it best demonstrated, try an old-new game comaprison of great games. One I like is Doom and Unreal Tournament 2004. The bots in UT 2004, while not as cunning as a human, are pretty damn good. They work together, they understand how to retreat, how to dodge, etc. They actually feel like fairly realistic opponents at lower skill levels (at higher levels their speed and accuracy outstrips their tactics). The enemies in Doom, on the other hand, are STUPID. They have basically only one strategy: follow player, attack them. It's easy to kite them around and draw them into traps. Likewise dodging their shots is simple since they always shoot at where you are now, not where you might dodge to.

    Doom is another excellent example since it has been updated with modren graphics. The Doomsday/jDoom project (www.doomsdayhq.com) has rewritten the Doom engine to use OpenGL. Other people have contributed 3d models for enemies, high resolution textures, and quality music (me). The result is something far and away better looking and sounding then the orignal Doom. It has made Doom better, not worse. You thought the music was spooky when your little FM card was squaking it out? Try it now.

    Also the graphics cards DO lead to better AI. AI is processed by your CPU and to do good AI, you need a lot of CPU time. That is one of two chronic problems with game AI. You just don't have enough CPU time to spend on it without screwing the game engine (the other is that it is just generally hard to write realistic AI algorithms). Well, modren graphics cards are taking most of the graphics load off the CPU, so there is more time available for other tasks, like AI.

    1. Re:They are by Entropius · · Score: 1

      You make a good point.

      Tactical AI has gotten pretty good, especially in twitch-type games like UT2K4: the bots are competent opponents without being "obviously bots".

      However, strategic AI is quite lacking. I am thinking mainly of D&D-type games, as that's where most of my experience lies.

      For instance, in games like ToEE or NWN, enemies have basically one reaction to the player's presence: fight until they die. I want enemies that are obviously overmatched to retreat and go fetch help... to set ambushes on their own without having special scripting to do so... etc.

    2. Re:They are by Sycraft-fu · · Score: 1

      Well I think that kind of AI is improving too. Haven't seen it yet in an RPG, but in new RTS's it's getting much better. I was quite supprised at the AI upgrades in the C&C Generals expansion pack Zero Hour. The AI is pretty deceant and does understand how to scout, build forces, withdraw, etc. Still has a long way to go, but getting better.

  122. Re:Link to previous discussion on same/similar sub by Anonymous Coward · · Score: 0

    If you can, check out OS X's directory structure, it's beautiful.

    Utter crap, fanboy. OS X's directory structure is a basic UNIX system hidden by the file manager, with applications thrown on to '/'.

  123. Re:Link to previous discussion on same/similar sub by Tablizer · · Score: 2, Funny

    Quote from that topic: "Reminds me of the good old days when you used the processors in the C64 tapedrive to compute stuff. Wouldn't want to waste those precious cycles."

    I wonder if that is where they kept their porn back then also.

  124. Well, this isn't for general computing... by Doogie5526 · · Score: 1
    I've been talking with a few people about applying GPU power to do the skin solver (simulation) in a muscle simulation (which is computationally harder to figure out than the muscles themselves). Not only will this be faster, but utilize the GPU which normally isn't used when rendering 3d animation for TV or film.

    This also could help bring more advanced methods of 3d animation to smaller, cheaper computers/studios.

  125. Re:Link to previous discussion on same/similar sub by Anonymous Coward · · Score: 0

    Incidentally, e17 exploited (hardware) GL rendering of 2D graphics via evas a bit before Apple put that into OS X.

    How does that compare? e17 still hasn't been released. Quoth enlightenment.org : Sat May 1 - benr - Enlightenment DR16.7-pre1 Released What you're saying is that e17 was announced before Quartz Extreme was released.

    Why do you jackasses have to tack on lies at the end of your otherwise informative posts?

  126. Re:Link to previous discussion on same/similar sub by N1KO · · Score: 1

    Just because it isn't stable doesn't mean it doesn't exist. Tell all the people running e17 apps like entrance that they really don't have hardware renderning.

  127. interesting conundrum by CAIMLAS · · Score: 1

    This raises the question: would it be possible to get a fleet of systems with multiple AGP ports, so as to utilize multiple GPUs in each machine for scientific and/or industrial purposes (for instance, in the film industry)? Or does the AGP spec not allow for a second AGP slot?

    I wonder how quickly someone could break top-notch encryption with 10 servers x 5 GPUs each.

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    1. Re:interesting conundrum by karmajudgment · · Score: 1

      It is actually an excellent idea to allow for multiple AGP busses in PCs, as there are many applications that require the use of more than one or two displays (3D projection-based displays, Virtual Reality spaces -- Caves etc). Currently, networked multi-processed machines are utilized for such applications.

    2. Re:interesting conundrum by CAIMLAS · · Score: 1

      Hrm, but what about dualhead AGP cards? Are they not sufficient?

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    3. Re:interesting conundrum by drfreak · · Score: 1

      I bet this will happen when everything switches to PCI Express.

    4. Re:interesting conundrum by karmajudgment · · Score: 1

      For a 4 plane Cave which projects in 3d, you would need 8 video outputs -- not just two. Also, it is useful to have a monitor video output as well to control the video rendering host -- but of course it is possible on many platforms to do this remotely.

  128. There's more at work here than just snobbery by MilenCent · · Score: 1

    Since I get the sense that some people would consider me one of the snobs to which these messages refer, I'll respond.

    *YES*, there were plenty of awful games back in the old days, too. The Atari 2600 died as a system, not because of advancing technology, but because of the flood of awful games for it. Nintendo started their infamous licencing plan, not just as a way to control the market for their system and thus make a pile of cash, but also to avoid the situation where hundreds of third-party companies could make horrendous games for their system and damage its reputation. Even so, there was still Color Dreams and the other unlicenced 3rd party developers, and most (but not all - check out Krazy Kreatures) of their games were not worth the silicon they were stamped on.

    Old arcade games, likewise, had their share of awful titles. And some older games have game play that doesn't hold up to today's standards. (I actually consider Donkey Kong to be one of these.)

    But it's still a fact that many recent games don't offer much more over the games upon which they were based than upgraded graphics. I, and I suspect most of the people who are being slammed here, argue that improved graphics and sound are nice, but not enough in themselves.

    Furthermore, think about the number of old games that are still remembered today. There are actually a *lot* of them, most released in a surprisingly short period of time. Space Invaders, the first extremely popular game, was released in 1978. Pac-Man was released in 1981. The Crash was late 83/early 84. That's just six years, and arcades got Robotron, Defender & Stargate, Joust, Asteroids & Asteroids Deluxe, Tempest, Donkey Kong & Jr, Frogger, Sinistar, Battlezone, Pac-Man & Ms. & Super, Centipede & Millipede, Pole Position, Tron, Q*Bert, Star Wars, and a good number of games I've neglected. Look at arcades today, hell I'll even let you include consoles and PCs, and try to argue there's a similar amount of innovation.

    Further, all of these games are substantively different from each other. Even most obvious sequels here, the Pac-Man trilogy, had enough differences that a player good at one might not be at one of the others: Ms. Pac-Man was resistant to patterns, and Super Pac-Man had the keys/doors play mechanic that allowed the player's actions to change the maze.

    Now there *were* blatant-rip-offs in those days. There were many Pac-Man clones produced, most inferior to the original game and some that were basically the original game with hacked graphics. Many of these were bootlegs. And there were games that were released with minor graphics and gameplay changes, like Super Zaxxon.

    But in general, it was a lot easier then to make a game with a unique play mechanic, try to sell it and succeed than it is now. That's why old arcade and console games get released in 24-packs for $20 these days and sell pretty well. Nostalgia is certainly a factor, but it helps that the games stand up.

  129. No more impressive? by Anonymous Coward · · Score: 0

    The peak throughput of altivec is about twice that of SSE. P3/4 and K7/8 can all do 2 single precision adds and 2 muls per cycle while G4/5 can do 4 fused multiply-adds.
    Actual usable performace is also way better since you need less instructions for the same thing since they don't need to overwrite input registers and you have more registers around as well.
    (then the real problem is that there's little properly vectorized code around..)

  130. Re:Link to previous discussion on same/similar sub by curious.corn · · Score: 0, Flamebait

    perhaps he was referring to the .app directories that contain all resources, non shared libs, executables and various resources necessary to an application, opposed to sputtering crap all over the c:/ filesystem... utter crap, M$ fanboy...

    --
    Mi domando chi à il mandante di tutte le cazzate che faccio - Altan
  131. What could it do? by bob65 · · Score: 1
    What could my video card be doing for me while I am not playing the latest 3d games?

    Doing eyecandy for Longhorn

  132. Re:transistor counts through the ages (PPC) by nothings · · Score: 1

    Good call. Updated.

  133. Re:Link to previous discussion on same/similar sub by mrchaotica · · Score: 2, Funny

    Files aren't supposed to go all together; they're supposed to be divided by type: /bin, /etc, /lib, etc.!

    - UNIX fanboy

    (yes, that was a joke; actually, I'm looking forward to database-based file systems - but not proprietary ones)

    --

    "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

  134. Re:Link to previous discussion on same/similar sub by RzUpAnmsCwrds · · Score: 1

    "Windows users might have it by 2007"

    Longhorn will likely ship in 2006.

    Moreover, the DCE goes far beyond what Quartz extreme does. QE is actually quite primitive - very few operations are GPU accelerated. That's why window resizing is still horribly slow.

  135. Faster vs fast enough by Latent+Heat · · Score: 1
    I second everything you just said. "When you are unlucky, the next CPU you buy is faster in performing the task itself." To that I would add, "that fancy whiz-bang graphics card/audio processors/DSP chip is no longer being supported."

    The problem is Moore's law. There is always going to be some special purpose DSP-GPU-coprocessor thingy that is faster than the CPU, but if you wait long enough, the CPU is going to be fast enough, and the CPU is going to have a much bigger customer base for your software product than requiring people to go out and buy the DSP-GPU-coprocessor du jour.

  136. Linux Distro by Anonymous Coward · · Score: 0

    All we need now is a Linux distro that will run in a Windows' window and have it's own processor (GPU).

  137. Re:Link to previous discussion on same/similar sub by cehardin · · Score: 2, Informative
    Utter crap, fanboy. OS X's directory structure is a basic UNIX system hidden by the file manager, with applications thrown on to '/'."

    Boy, you really have no idea what the heck you are talking about, do you? Of course the basic UNIX stuff is there, /bin, /sbin, /usr/local, all that stuff.

    Those directories have very little files in them, you will also notice a lack of init.d startup scripts. Most of the system is contained in /System.

    For example, rather than /etc/init.d, it has startup services in /System/Library/StartupItems. For example there is an apache folder, in that are the scripts necessary to start Apache along with a file which describes Apache's dependencies. Also, these startup items are multi lingual. You can boot into any language you want. All of this in one folder. That's f*cking elegance, yet it is only a very small example.

    Check it out, you will see.

  138. Re:Link to previous discussion on same/similar sub by b-baggins · · Score: 1

    And, of course, we all know that Apple will make absolutely no improvements from now until 2006 when Longhorn ships.

    Personally, I don't think Longhorn will ever ship because each time Apple releases the next version of OS X, Microsoft has to scrap all their Longhorn work and start over again to catch up.

    --
    You can tell a great deal about the character of a man by observing those who hate him.
  139. Already in use in audio processing - UAD card by sleepcountry · · Score: 1

    There's some speculation that a fairly popular audio DSP card uses a Chromatic Research video chip. This card supposedly very accurately emulates some very prized vintage audio processing equipment like the 1176 and the LA2A compressors and limiters. http://www.chrismilne.com/uadforums/topic.asp?TOPI C_ID=579 I tried the UAD-1 before, and I found the biggest problem was the extra latency that it ADDED to my signal chain. Pulled it out and took it back to the store. sleep

  140. Re:Link to previous discussion on same/similar sub by Anonymous Coward · · Score: 0
    Longhorn will likely ship in 2006. Moreover, the DCE goes far beyond what Quartz extreme does.
    You are right about several things. Longhorn is supposed to be released in mid-to-late 2006. That is, if there are no additional delays. Hummm...

    And the DCE will beat the shit out of Quartz Extreme. Yes, in late 2006 Longhorn's DCE will do much, much more than what Quartz extreme does *now*.

    Apple has shown an outstanding ability to rock the OS world every year since OS X was released. And you never now how their next OS will amaze you until it's presented very few months before it's released (usually at the WWDC). Who knows where OS X will be by 2007^H6 ? (That is, outside Lord Job's closest circle.)

    Oh, and apparently the acronym DCE (Desktop Composition Engine) was changed to DWM (Desktop Windows Manager).

  141. Re:Not the Point by djplurvert · · Score: 1

    This is completely off topic. But, way way way back when my buddy's 60s mustang fastback crapped out on him. The only thing we had to get it back to the baracks was his kawasaki 400. Well, he ties em together with some rope and gives me some "pointers" on driving the towed vehicle. Since I had just learned to drive (in a military jeep), experience was not my strong suit. At any rate, he manages to get us going and tows me through the fog of burnt clutch smell he created and things are going fine until we start going down a slight hill. I'm catching up to him fast and panic, I tap the brakes, the rope snaps, he brakes, I let go, I bump his bike, his tire squeals, he guns it and I coast to a stop at the bottom of the hill. After I got out he started giving me the business about how I suck as a driver yada yada yada when a n MP (military police) jeep pulls up. The cop gets out and asks us why we are out here at 4 in the morning and Rocky feeds him some shit when the cop sees the broken rope. He asks, "you guys trying to tow this car with that motorcycle", we laugh, and reply in unison "of course not". He warns us about the associated dangers and leaves us to deal with the dead car. At that point, we decided to just push the car back. I got to bed around 6am that morning.

    I'm not making this up....

  142. Re:Link to previous discussion on same/similar sub by RzUpAnmsCwrds · · Score: 1

    You don't think that the world's largest software company will ever release a new version of their most profitable product?

    "And, of course, we all know that Apple will make absolutely no improvements from now until 2006 when Longhorn ships."

    Of course they will. But from what I've seen of Longhorn, it is a fundamental step forward in the way people interact with their computers. There have been very few fundamental UI changes to OS X since its release. Refinements, polish, new features - yes. OS X is a very nice OS, and it will be more so by the time that Longhorn is released. But in 2006, OS X will still be OS X. Longhorn is something totally different from XP. It is a different kind of operating system. That's what people don't get.

  143. Re:Link to previous discussion on same/similar sub by Anonymous Coward · · Score: 0

    YHBT. YHL. HAND.

  144. Re:Link to previous discussion on same/similar sub by cliffwoolley · · Score: 2, Informative

    As for organizations beating slashdot to the punch on this one, that's true... but it's good to see this getting even more exposure. :)

    GPGPU (General-Purpose computation on GPUs) was a hot topic at various conferences in 2003; a number of papers were published on the subject. At SIGGRAPH 2004 there will be a full-day course on GPGPU given by eight of the experts in the field (including myself).

    Mark Harris of NVIDIA maintains a website dedicated to GPGPU topics, including discussion forums and news postings. Well worth a browse if you're interested in GPGPU topics.

    I look forward to seeing some of you at SIGGRAPH! :)

    --Cliff

  145. Re:Link to previous discussion on same/similar sub by cehardin · · Score: 1

    Whether he or she was trolling was besides the point. I'm just putting out facts for the benefit of others.

  146. Re:Link to previous discussion on same/similar sub by Crazy+Eight · · Score: 1
    What you're saying is that e17 was announced before Quartz Extreme was released.

    No, that's not what I'm saying at all. I was only pointing out that the notion wasn't an insight discovered by geniuses at Apple that non-promethean sheep can merely copy. The original poster erroneously conflated Apple's GUI architecture with the topic at hand and made a point of noting that they "beat us to it". If you go here you'll find a Slashdot article from January 21, 2001 entitled, "Rasterman's New Toy: EVAS". Evas was (still is?) his canvas library that included a (hardware accelerated) OpenGL backend. I was building it from cvs at the time, clicking "Hardware" on the demo program, and watching the FPS go nuts. E17 may never see the light of day, but the first evas (IIRC, it was dropped and rewritten) was a functioning, solid proof of concept that received a lot of attention. Jaguar was released a little more than a year and a half later on August 24, 2002.

  147. Amusing by Pan+T.+Hose · · Score: 1

    Using GPUs For General-Purpose Computing

    I'm glad that finally they started to use the General-Purpose Unit. What took them so long?

    your huge ignorance was blocking the screen.
    it stands for "Graphics Processing Unit".

    What do you expect, he's a phd, you can't expect him to be based in the real world...

    You people remind me of some of my students who constantly make me wondering whether one indeed needs a philosophiae doctor degree to have any sense of humour whatsoever... Apparently the sophisticated art of satira mustn't be "based in the real world" (sic), must it? In any event, I find this "real world" antiintellectualism certainly amusing, especially when I get moderated down on Slashdot because of my doctorate or Mensa membership. Annoyingly infantile, yet amusing.

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
    1. Re:Amusing by rokzy · · Score: 1

      stop crying that no one likes you. it's not about your PhD or your Mensa membership, it's about you being sarcastic and belittling others' work when in fact you are just wrong.

  148. General Purpose vs. Special Purpose by winchester · · Score: 1

    So a special purpose processor is better at doing some things than a general purpose processor. How new is that? That Cray X-1 upstairs is very good at doing vector operations, however it sucks at doing anything scalar (like compiling, perhaps?) This is why Cray gives you a general purpose workstation to go with the X-1 as a compilation workstation (among others).

  149. Re:Link to previous discussion on same/similar sub by b-baggins · · Score: 1

    You don't think that the world's largest software company will ever release a new version of their most profitable product?

    You may want to reboot your ironic humor sensor.

    --
    You can tell a great deal about the character of a man by observing those who hate him.
  150. ACM Workshop: General Purpose Computing Using GPUs by manocha · · Score: 1
    That has been recent interest in using GPUs for different applications, we are organizing the first ACM Workshop on General Purpose Computing using GPUs (GPGP) at LA on Augusg 7-8, right before SIGGRAPH. More information about this workshop is available at:

    http://www.cs.unc.edu/GP2

    The participants include researchers and developers from computer architecture, software systems, high-performance and scientific computing in addition to computer graphics. The "Call for Posters" will be coming out soon (with deadline of June 01). Check the WWW site for more details.

  151. Laser printers... by Taed · · Score: 1

    I used to do something similar back in grad school with our laser printers. The CPU inside the laster printers (some version of a Motorola 68000) was more powerful than the machines that we had on our desks, and they had a fair amount of memory as well. So, for some computations, I'd write little PostScript programs, have the laster printer churn on it overnight (of course, no one could print in the meantime), and then print out the answer when it was done.

  152. It BEGS to be asked: by fikx · · Score: 1

    But....can you run linux on it?!

    No, really, can you?

    --
    AB HOC POSSUM VIDERE DOMUM TUUM
  153. Reference site for General Purpose GPU by dumky · · Score: 1

    General-Purpose Computation Using Graphics Hardware. Anyone interested in this topic should check that site out.

  154. Good thinking Paul! by FutureExpressionist · · Score: 1

    And thanks to other readers for the follow-up replies.. I'm looking at gpgpu.org with interest.

  155. Crying? by Pan+T.+Hose · · Score: 1

    What do you expect, he's a phd, you can't expect him to be based in the real world...

    You people remind me of some of my students who constantly make me wondering whether one indeed needs a philosophiae doctor degree to have any sense of humour whatsoever... Apparently the sophisticated art of satira mustn't be "based in the real world" (sic), must it? In any event, I find this "real world" antiintellectualism certainly amusing, especially when I get moderated down on Slashdot because of my doctorate or Mensa membership. Annoyingly infantile, yet amusing.

    stop crying that no one likes you.

    I was not crying, not at all. Quite to the contrary, in fact--I was laughing. That is what I usually do when I find something amusing. And that is why I have written that I found it amusing. I thought it was self-explanatory.

    it's not about your PhD or your Mensa membership, it's about you being sarcastic and belittling others' work when in fact you are just wrong.

    I wouldn't call it sarcasm but rather satira. (And, in fact, I have called it satira.) To be honest, I fail to understand how could I have achieved said satira without having been wrong... Do you really think that "Using GPUs For General-Purpose Computing--I'm glad that finally they started to use the Graphical Processing Unit" would have been moderated as Score:5, 100% Funny?

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  156. Re:Link to previous discussion on same/similar sub by t4k1s · · Score: 1

    Rasterman actually did the same thing before. I think it was 6 years ago when Evas used OpenGL for its compositing of pixmaps to be used for Enlightenment 0.17. Unfortunately, E0.17 still isn't there yet.

  157. Re:Not the Point by kfg · · Score: 1

    It's a bit more plebian computer nerdy than that. There's a new mod out for Grand Prix Legends. I've been gaming and catching up on the related forums.

    That and working on some ideas for a bicycle towed popup camper, spurred by acquiring a pair of wheels really cheap at a garage sale, which is why that particular post caught my eye and engendered my particular response.

    Towing a popup camper with a motorcycle actually makes a lot more sense than towing one with a minivan, and you can buy commercial products.

    For bicycles I'm reduced to DIY. It poses some interesting engineering problems.

    KFG

  158. Re:Not the Point by feidaykin · · Score: 1
    The game looks interesting. Amazingly, 3dgamers.com has an 18 MB demo that I'm downloading right now. I usually have difficulty getting demos from farther back than 2001, which is frustrating when you have rather humble hardware. So a game from 1998 is great, since it's a year a lot of my hardware belongs in. Come to think of it, 98 was a good year for gaming. StarCraft, Half-Life... I do recall having difficulty with homework thanks to those titles. Yes I was in high school then, heh... Either I'm a little kid or you're an old man, which do you prefer? ;)

    --

    "To confine our attention to terrestrial matters would be to limit the human spirit." -Stephen Hawking