Tegra 4 Likely To Include Kepler DNA
MrSeb writes "Late last week, Jen-Hsun Huang sent a letter to Nvidia employees congratulating them on successfully launching the highly acclaimed GeForce GTX 680. After discussing how Nvidia changed its entire approach to GPU design to create the new GK104, Jen-Hsun writes: 'Today is just the beginning of Kepler. Because of its super energy-efficient architecture, we will extend GPUs into datacenters, to super thin notebooks, to superphones.' (Nvidia calls Tegra-powered products 'super,' as in super phones, super tablets, etc, presumably because it believes you'll be more inclined to buy one if you associate it with a red-booted man in blue spandex.) This has touched off quite a bit of speculation concerning Nvidia's Tegra 4, codenamed Wayne, including assertions that Nvidia's next-gen SoC will use a Kepler-derived graphics core. That's probably true, but the implications are considerably wider than a simple boost to the chip's graphics performance."
Nvidia's CEO is also predicting this summer will see the rise of $200 Android tablets.
So will this version be something more than a paper tiger? So far the Tegras have sounded better on paper than their real world performance ends up being.
Half of our department's research sits directly on CUDA, now, and I haven't really had this experience at all. CUDA is as standard as you can get for NVIDIA architecture -- ditto OpenCL for AMD. The problem with trying to abstract that is the same problem with trying to use something higher-level than C -- you're targeting an accelerator meant to take computational load, not a general-purpose computer. It's very much systems programming.
I'm honestly not really sure how much more abstact you could make it -- memory management is required because it's a fact of the hardware -- the GPU is across a bus and your compiler (or language) doesn't know more about your data semantics than you do. Pipelining and cache management are a fact of life in HPC already, and I haven't seen anything nutso you have to do to support proper instruction flow for nVidia cards (although I've mostly just targeted Fermi).
In my experience, GPU programming works exactly like you'd expect it to work. Your nightmare doesn't sound like it's with GPU programming, it sounds like it's with NVidia's marketing.
GPU processors are really small, so everything you've listed here is expected. The code size, variable limits, etc etc. The advantage is you have thousands of them at your disposal. That makes GPUs extremely good when you need to run a kernel with x where x is from 0 to a trillion. Upload the problem set to VRAM, and send the cores to work.
Stuff like C++ and high level languages is also not good for this sort of work. I'm not even sure why people are bothering with C++ on GPGPU to be perfectly frank. Again, you're writing kernels here, not entire programs. C++ is honestly bulky for GPGPU work and I can't imagine what I'd use it for. Both CUDA and OpenCL are already pretty high level, any further past that and you're risking sacrificing performance.
Interpreted code is also good. It's usually JIT compiled for the architecture you're working on. In the case of OpenCL and CUDA, it could be recompiled to run on an ATI card, NVidia card, or local CPU, all of which have different machine languages that you won't know about until runtime.
It sounds like you're angry because GPU programming isn't very much like programming for CPUs, and you'd be right. That's the nature of the hardware, it's built very different and is optimized for different tasks. Whether that's because you were sold a false bill of goods by NVidia, I don't know. But it doesn't mean GPU programming is broken, it just may not be for you. It mostly sounds like you're just trying to cram too much into your individual kernels though.