NVIDIA Tegra X1 Performance Exceeds Intel Bay Trail SoCs, AMD AM1 APUs
An anonymous reader writes: A NVIDIA SHIELD Android TV modified to run Ubuntu Linux is providing interesting data on how NVIDIA's latest "Tegra X1" 64-bit ARM big.LITTLE SoC compares to various Intel/AMD/MIPS systems of varying form factors. Tegra X1 benchmarks on Ubuntu show strong performance with the X1 SoC in this $200 Android TV device, beating out low-power Intel Atom/Celeron Bay Trail SoCs, AMD AM1 APUs, and in some workloads is even getting close to an Intel Core i3 "Broadwell" NUC. The Tegra X1 features Maxwell "GM20B" graphics and the total power consumption is less than 10 Watts.
The X1 uses a standard ARM Cortex A57 (specifically it's an A57/A53 big.LITTLE 4+4 config), so this says more about ARM's chip than anything nVidia did...
Now if you compared nVidia's Denver CPU, their in-house processor... The Denver is nearly twice as fast as the A57, but only comes in a dual-core config, so it's probably drawing a good deal more power. When you compare a quad-core A57 to a dual-core Denver, the A57 comes out slightly ahead in multicore benchmarks. Of course, single core performance is important too, so I'd be tempted to take a dual-core part over a quad-core if the dual-core had twice the performance per-core...
Why the X1 didn't use a variant of Denver isn't something that nVidia has said, but the assumption most make is that it wasn't ready for the die shrink to 20nm that the X1 entailed.
Look here at the compiler settings. The x86 processors are somewhat hampered by non-optimal settings. For example the i3 5010U is set to -mtune=generic. In my experience, that's basically going to default to AMD K8 optimization with no AVX/AVX2 support. The better option would be using -mtune=native or better yet -march=native, which would detect the CPU and produce a more optimized binary.
Actually, the point of having stable APIs and ABIs is so that other dependent binaries and code continue to work and compile even when changes are made to the underlying software.
Do VDPAU ( Nvidia video decode hardware acceleration API) drivers exist for this platform? In the past, I believe only the x86 binary blob drivers supported VDPAU.
If they exist, this would make an excellent MythTV DVR frontend device.
10W is incredibly hot for any sort of passively cooled, enclosed device.
The machine would be quite warm (almost hot) to the touch unless they use some inventive cooling. The current Gen Apple TV is about 6W, and your typical smartphone is around 2-3 W.
There is a reason that NV has only really been able to get a foothold in tablets, android TV, cars and their own shield product. Quite simply put, they have historically been fast and hot. Great as a SOC within certain markets.
Interesting take-home from the benchmark: the AMD desktop processors did prtty respectably well compared to the i7s. Ususally a bit slower, sometimes actually faster and we know an AMD setup is certainly cheaper.
Interesting that in the open source, repeatable, examinable benchmarks the difference between Itel and AMD is a lot less pronounced.
SJW n. One who posts facts.
I run Gentoo!!!
Besides that, I did some very recent Intel CPU benchmarking as I tried to figure out IPC gains over CPU generations. I ran my benchmarks on GCC 4.8/4.9/5.2 and LLVM 3.6 on Nehalem and Ivy Bridge. I also included march=generic vs march=native. Quick summary: For generic integer/floating-point code, the Intel Core-i7 CPUs don't actually benefit much from optimizations for newer architectures, especially on x86-64. The exception here is that 32-bit generic FPU x87 code is slower than SSE2, but the latter is always available in x86-64. Actually, sometimes GCC even produced worse code for march=native on Ivy Bridge.
The above actually makes sense to me. Starting from Nehalem, the internal CPU microarchitecture hasn't changed that much and the new instructions tend to be quite specific. Of course the newer generations have lots of small optimizations, more op execution units, bigger reorder buffers and caches, a bit faster ALUs and other units, and so on. But nothing drastic that would require a new instruction scheduler, for example. Pentium 4 was, of course, a completely different beast that tends to perform badly if the code is not targeted properly due to its excessive pipeline length.
OTOH, for specialized things such as video decoding/encoding, the libraries tend to do run-time CPU detection and use different code paths based on what is available. For example, FFmpeg does this (or at least mplayer did), and AFAIK OpenSSL does this for AES, too.
Bottom line: So, even if I'm a Gentoo user, I wouldn't worry too much about march=generic.
The more troubling question is why an application should feel forced to do anything in the face of a platform upgrade in order to work at all. A modern Windows desktop can still run 10 year old software without a hiccup. Going back 20 years you start needing something like dosbox to use a lot of the applications, though still doable. I haven't tried firing it up in a while, but last time I tried the commercial package of quake 3 under linux, it still worked on a modern distribution. Same for linux neverwinter nights. As an application maintainer for some linux stuff, the only things that I can recall forcing my hand to change something for things to work were systemd and python changes.
Android (and to a significant, but somewhat lesser extent Apple) are not doing that good with respect to application and/or hardware compatibility into the past. It's a tiring situation for developers to have to follow an upgrade treadmill in order to cater to new system sales, just to keep the current applications workable as-is.
XML is like violence. If it doesn't solve the problem, use more.