NVIDIA Jetson TX1 Performance Shines For GPU Computing (phoronix.com)

← Back to Stories (view on slashdot.org)

NVIDIA Jetson TX1 Performance Shines For GPU Computing (phoronix.com)

Posted by samzenpus on Monday November 16, 2015 @07:30AM from the running-the-numbers dept.

An anonymous reader writes: Following last week's announcement of the Jetson TX1 development board, NVIDIA is now allowing independent reports of performance for their $599 USD 64-bit ARM development board. Linux results published by Phoronix show very strong performance for the Jetson TX1 when looking at the Cortex-A57 speed relative to the Tegra K1 and older Tegra SoCs along with other ARM hardware like Calxeda and Raspberry Pi. The Jetson TX1 was generally multiple times faster than ARM hardware a few years old. The graphics performance was twice as fast as the year-old Jetson TK1 thanks to the Maxwell GPU. Compared to x86 hardware, in CPU-bound tasks the performance is comparable to an AMD Sempron/Phenom except when utilizing GPGPU computing where it's then faster than Intel Skylake and Xeon processors. The Jetson TX1 had a peak power consumption of 16 Watts and an average power use of under 10 Watts.

22 comments

Min score:

Reason:

Sort:

Tegra X1 used to be the fastest ARM/GPU SoC by edxwelch · 2015-11-16 07:44 · Score: 1, Troll

Tegra X1 used to be the fastest ARM/GPU SoC - but now the A9X in the new iPad Pro leaves it in the dust.
Meanwhile, the Cortex-A57 is probably one of ARMs worse cores to date. It really needs to be implemented on FinFETs to avoid overheating / throttling and performs poorly compared to custom ARM cores.
1. Re:Tegra X1 used to be the fastest ARM/GPU SoC by Anonymous Coward · 2015-11-16 08:30 · Score: 1
  
  Tegra X1 used to be the fastest ARM/GPU SoC - but now the A9X in the new iPad Pro leaves it in the dust.
  Meanwhile, the Cortex-A57 is probably one of ARMs worse cores to date. It really needs to be implemented on FinFETs to avoid overheating / throttling and performs poorly compared to custom ARM cores.
  Yup, the A9X is faster than a chip several years old. Such breathtaking engineering prowess!
2. Re:Tegra X1 used to be the fastest ARM/GPU SoC by edxwelch · 2015-11-16 09:30 · Score: 1, Troll
  
  What are you talking about? Tegra X1 came out a few month ago
Heterogeneous Memory FTW by kaiser423 · 2015-11-16 07:51 · Score: 1, Interesting

I'm pretty pumped about playing with the dev kit. It has a heterogeneous memory architecture between the CPU and GPU. For lots of GPGPU applications, the latency of transfer between system RAM and the GPU can be a bottleneck. You're transferring huge chunks of data, and if you need to bounce the problem back and forth between the CPU and GPU, which is pretty common or if you have any real-time requirements, it can be a big deal. In many applications it can be 40%+ of time spent in just transferring your data back and forth from GPU to CPU.

For example, lots of people used the TK1 (predecessor to the TX1) for computer vision applications because it ran faster than the fastest GPU merely because you didn't have the memory transfer times. But the TK1 was slightly underpowered for these applications. The TX1 should close that gap, and really allow true GPU/CPU co-processing, versus shuttling around in memory.

Nvidia will be bringing heterogeneous computing to the desktop soon too -- they're already making it happen with IBM, and then their roadmap is to push it to x86 land.
1. Re:Heterogeneous Memory FTW by Andy+Dodd · 2015-11-16 08:07 · Score: 1
  
  It's too bad that despite the chip not being much more expensive (as evidenced by the fact that TX1 consumer products are reasonably priced - the TX1-based Shield ATV is $199 including a controller, the TK1-based Shield Tablet was $299, and $100+ for battery/display makes a lot of sense) or around the same price, the TX1 dev board is 3 times the price of the TK1 dev board. (Jetson TK1 was $192).
  I was really hoping for a successor to the Jetson TK1 that used the X1, but this isn't really a successor - despite a different name it's clearly a completely different animal.
  
  --
  retrorocket.o not found, launch anyway?
2. Re:Heterogeneous Memory FTW by Arkh89 · 2015-11-16 08:12 · Score: 1
  
  One of the main problem of this on the desktop is the bandwidth. Having true heterogeneous memory arch between CPU and GPU but with slower than current in-device bandwidth (around 250GB/s) would be a waste for a vastly larger applications pool than for those which would benefit from it.
3. Re:Heterogeneous Memory FTW by Anonymous Coward · 2015-11-16 13:14 · Score: 1
  
  meanwhile, this is already been done for a few years with AMD APUs...
4. Re:Heterogeneous Memory FTW by Shinobi · 2015-11-16 17:01 · Score: 1
  
  And even AMD were well over a decade late to the party, compared to the Silicon Graphics O2, which used UMA.
Peculiar omission by serviscope_minor · 2015-11-16 07:55 · Score: 5, Interesting

I'm rather surprised there was no AMD APU in the GPGPU comparisons. That is, after all, rather the whole point of the APUs. And due to the super low latency of the AMD ones, they tended to do rather well compared to other chips. The HSA stuff seems to go rather beyond the GPGPU stuff in terms of its range of applications.

--
SJW n. One who posts facts.
1. Re:Peculiar omission by Anonymous Coward · 2015-11-16 07:58 · Score: 0
  
  AMD only today announced their plans for source translations of CUDA codes... The Jetson TX1 doesn't support OpenCL but only CUDA.
2. Re:Peculiar omission by Anonymous Coward · 2015-11-16 08:04 · Score: 0
  
  OpenCL could easily be emulated.
3. Re:Peculiar omission by UnknownSoldier · 2015-11-16 10:01 · Score: 1
  
  Whoa...
  You wouldn't happen to have a link for that by chance please?
4. Re:Peculiar omission by GiganticLyingMouth · 2015-11-16 10:11 · Score: 1
  
  Here you are... https://www.phoronix.com/scan....
5. Re:Peculiar omission by UnknownSoldier · 2015-11-16 10:26 · Score: 1
  
  Sweet. Thanks!
Archtectural Similarities by gentryx · 2015-11-16 08:27 · Score: 1

The USP of AMD's APUs used to be having the GPU and the CPU on the same die. This is true for Jetson as well, but it is compatible with the whole CUDA universe, too. So now NVIDIA is eating AMD's lunch.

--
Computer simulation made easy -- LibGeoDecomp
1. Re:Archtectural Similarities by serviscope_minor · 2015-11-16 09:21 · Score: 4, Informative
  
  The USP of AMD's APUs used to be having the GPU and the CPU on the same die.
  No, it's much more than that. It's not just on the same die, it;s the same side of the MMU as the CPU and the same side of the cache. This means you can pass data back and forth between the two units with a latency measured in nanoseconds, because you can simply hand over a pointer in the same memory space. I believe HSA also specifies things like atomics which are consistent across the CPU and GPU, as well as synchronisation primitives.
  In other words HSA is much more than just bolting a CPU and a GPU onto the same bus on a die.
  
  --
  SJW n. One who posts facts.
2. Re:Archtectural Similarities by Anonymous Coward · 2015-11-16 10:06 · Score: 0
  
  Some HSA features have been available since 2012, but coherent memory has only been available since 2014 with Kaveri and the PS4, and GPU preemption and context switching since Carrizo this year. None of the chips with HSA features use less that 12W.
  That's why HSA isn't compared with ARM. AMD hasn't had the resources to put HSA features on their low-end chips, which are the ones that no one even buys.
  ARM is part of the HSA Foundation, but hasn't released any HSA designs yet. I don't know what Samsung or Qualcomm's plans are
3. Re:Archtectural Similarities by serviscope_minor · 2015-11-16 10:59 · Score: 1
  
  Some HSA features have been available since 2012, but coherent memory has only been available since 2014 with Kaveri
  Yeah Kevari is where it got interesting, and where the LibreOffice benchmark with AMD trouncing everything else happened. Coherent memory turned it from a massive PITA into something much easier to program than normal GPGPU stuff. Lessons I learned from the supercomputer folks: latency is a killer.
  None of the chips with HSA features use less that 12W.
  That's comparable to the one in TFA: average use under 10W, but peak power is about 16W.
  
  --
  SJW n. One who posts facts.
Error on Page 7 ? by UnknownSoldier · 2015-11-16 10:15 · Score: 1

Looks like all the graphs on Page 7 has an incorrect TK1 label instead of the correct TX1 ?
http://www.phoronix.com/scan.p...
With nVidia getting serious about low power devices the next few years are going to be very interesting as AMD, Arm, Broadcom, and Intel all duke it out.
I can't wait till OpenCL is supported as well.