NVIDIA Jetson TX1 Performance Shines For GPU Computing (phoronix.com)
An anonymous reader writes: Following last week's announcement of the Jetson TX1 development board, NVIDIA is now allowing independent reports of performance for their $599 USD 64-bit ARM development board. Linux results published by Phoronix show very strong performance for the Jetson TX1 when looking at the Cortex-A57 speed relative to the Tegra K1 and older Tegra SoCs along with other ARM hardware like Calxeda and Raspberry Pi. The Jetson TX1 was generally multiple times faster than ARM hardware a few years old. The graphics performance was twice as fast as the year-old Jetson TK1 thanks to the Maxwell GPU. Compared to x86 hardware, in CPU-bound tasks the performance is comparable to an AMD Sempron/Phenom except when utilizing GPGPU computing where it's then faster than Intel Skylake and Xeon processors. The Jetson TX1 had a peak power consumption of 16 Watts and an average power use of under 10 Watts.
Tegra X1 used to be the fastest ARM/GPU SoC - but now the A9X in the new iPad Pro leaves it in the dust.
Meanwhile, the Cortex-A57 is probably one of ARMs worse cores to date. It really needs to be implemented on FinFETs to avoid overheating / throttling and performs poorly compared to custom ARM cores.
I'm pretty pumped about playing with the dev kit. It has a heterogeneous memory architecture between the CPU and GPU. For lots of GPGPU applications, the latency of transfer between system RAM and the GPU can be a bottleneck. You're transferring huge chunks of data, and if you need to bounce the problem back and forth between the CPU and GPU, which is pretty common or if you have any real-time requirements, it can be a big deal. In many applications it can be 40%+ of time spent in just transferring your data back and forth from GPU to CPU.
For example, lots of people used the TK1 (predecessor to the TX1) for computer vision applications because it ran faster than the fastest GPU merely because you didn't have the memory transfer times. But the TK1 was slightly underpowered for these applications. The TX1 should close that gap, and really allow true GPU/CPU co-processing, versus shuttling around in memory.
Nvidia will be bringing heterogeneous computing to the desktop soon too -- they're already making it happen with IBM, and then their roadmap is to push it to x86 land.
I'm rather surprised there was no AMD APU in the GPGPU comparisons. That is, after all, rather the whole point of the APUs. And due to the super low latency of the AMD ones, they tended to do rather well compared to other chips. The HSA stuff seems to go rather beyond the GPGPU stuff in terms of its range of applications.
SJW n. One who posts facts.
The USP of AMD's APUs used to be having the GPU and the CPU on the same die. This is true for Jetson as well, but it is compatible with the whole CUDA universe, too. So now NVIDIA is eating AMD's lunch.
Computer simulation made easy -- LibGeoDecomp
Looks like all the graphs on Page 7 has an incorrect TK1 label instead of the correct TX1 ?
http://www.phoronix.com/scan.p...
With nVidia getting serious about low power devices the next few years are going to be very interesting as AMD, Arm, Broadcom, and Intel all duke it out.
I can't wait till OpenCL is supported as well.