NVIDIA Tegra X1 Performance Exceeds Intel Bay Trail SoCs, AMD AM1 APUs
An anonymous reader writes: A NVIDIA SHIELD Android TV modified to run Ubuntu Linux is providing interesting data on how NVIDIA's latest "Tegra X1" 64-bit ARM big.LITTLE SoC compares to various Intel/AMD/MIPS systems of varying form factors. Tegra X1 benchmarks on Ubuntu show strong performance with the X1 SoC in this $200 Android TV device, beating out low-power Intel Atom/Celeron Bay Trail SoCs, AMD AM1 APUs, and in some workloads is even getting close to an Intel Core i3 "Broadwell" NUC. The Tegra X1 features Maxwell "GM20B" graphics and the total power consumption is less than 10 Watts.
no thanks...
The X1 uses a standard ARM Cortex A57 (specifically it's an A57/A53 big.LITTLE 4+4 config), so this says more about ARM's chip than anything nVidia did...
Now if you compared nVidia's Denver CPU, their in-house processor... The Denver is nearly twice as fast as the A57, but only comes in a dual-core config, so it's probably drawing a good deal more power. When you compare a quad-core A57 to a dual-core Denver, the A57 comes out slightly ahead in multicore benchmarks. Of course, single core performance is important too, so I'd be tempted to take a dual-core part over a quad-core if the dual-core had twice the performance per-core...
Why the X1 didn't use a variant of Denver isn't something that nVidia has said, but the assumption most make is that it wasn't ready for the die shrink to 20nm that the X1 entailed.
OMG, that video card is like, totally awesome! Less than 10 watts is super for that crazy power!
How does it compare to like PCI express cards?
Look here at the compiler settings. The x86 processors are somewhat hampered by non-optimal settings. For example the i3 5010U is set to -mtune=generic. In my experience, that's basically going to default to AMD K8 optimization with no AVX/AVX2 support. The better option would be using -mtune=native or better yet -march=native, which would detect the CPU and produce a more optimized binary.
I commend the troll effort here. You've got most of the themes woven into one post.
SoC ain't shit.
What incompetence led Intel to use a temporally relative name. It's on par with 'new' in the product name. Seems to work OK until it doesn't and looks idiotic in retrospect.
Do VDPAU ( Nvidia video decode hardware acceleration API) drivers exist for this platform? In the past, I believe only the x86 binary blob drivers supported VDPAU.
If they exist, this would make an excellent MythTV DVR frontend device.
How "modified" was the Shield to run Ubuntu? Can I buy a Shield and get Ubuntu on it today? Or is this benchmarking an exercise in futility?
10 times out of 10 NVidia non-GPU chip benchmarks are paid for by NVidia and are complete bullshit, designed to get fanboys to buy their latest chip. There have been no exceptions with Tegra to date.
it does worse? You just found out that the benchmark is bullshit.
Where are you finding the compiler flags? I can't see -mtune anywhere on that page.
Well with the possible exception of those running gentoo, 99% of end users will be running precompiled software that has to be compiled for a generic cpu as the distributor doesn't know exactly what type of processor its going to end up running on.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
This is exactly why the benchmarks include
1) a way to repeat the benchmarks as described in the article see page 4 - 'phoronix-test-suite benchmark 1507285-BE-POWERLOW159'.
2) The compiler options are included
Armed with those two pieces of information, you can go and "prove" that the benchmark is, as you called it - bullshit. Although rarely, if ever that I am aware of, does anyone respond to an article with those two pieces of information and say - "here, if you run it in this mode, you will see a marked difference in performance".
As Bert64 says in the response to the grandparent, 99% of end users will be running the software - either pre-compiled by their distribution vendor in this way, or compiled by the benchmark author's defaults. If you really want to prove that the benchmark is crap, then by all means make meaningful suggestions to _any_ of the existing machine benchmarks.
Michael (Phoronix) and I had some interesting discussions with Sun (pre Oracle) about 64 vs 32. They argued that the benchmarks were misleading because they did not use the Sun Studio (IIRC) compiler for 64 bit. In the discussions, it became clear to even the Sun people that it was quite difficult for even the Sun people to configure and use the platform in the way they wanted Phoronix to. Out of the box, gcc compiling for 32 bit - was how they configured the systems. And guess what, the './configure;make;make install' triplet would compile the same way we did.
Full disclosure, I have a long history with Phoronix, and have been involved in work that they have done in the past.
10W is incredibly hot for any sort of passively cooled, enclosed device.
The machine would be quite warm (almost hot) to the touch unless they use some inventive cooling. The current Gen Apple TV is about 6W, and your typical smartphone is around 2-3 W.
There is a reason that NV has only really been able to get a foothold in tablets, android TV, cars and their own shield product. Quite simply put, they have historically been fast and hot. Great as a SOC within certain markets.
Interesting take-home from the benchmark: the AMD desktop processors did prtty respectably well compared to the i7s. Ususally a bit slower, sometimes actually faster and we know an AMD setup is certainly cheaper.
Interesting that in the open source, repeatable, examinable benchmarks the difference between Itel and AMD is a lot less pronounced.
SJW n. One who posts facts.
.. and succeeded at being un-funny
Time for bed, said Zebedee - boing
I run Gentoo!!!
Besides that, I did some very recent Intel CPU benchmarking as I tried to figure out IPC gains over CPU generations. I ran my benchmarks on GCC 4.8/4.9/5.2 and LLVM 3.6 on Nehalem and Ivy Bridge. I also included march=generic vs march=native. Quick summary: For generic integer/floating-point code, the Intel Core-i7 CPUs don't actually benefit much from optimizations for newer architectures, especially on x86-64. The exception here is that 32-bit generic FPU x87 code is slower than SSE2, but the latter is always available in x86-64. Actually, sometimes GCC even produced worse code for march=native on Ivy Bridge.
The above actually makes sense to me. Starting from Nehalem, the internal CPU microarchitecture hasn't changed that much and the new instructions tend to be quite specific. Of course the newer generations have lots of small optimizations, more op execution units, bigger reorder buffers and caches, a bit faster ALUs and other units, and so on. But nothing drastic that would require a new instruction scheduler, for example. Pentium 4 was, of course, a completely different beast that tends to perform badly if the code is not targeted properly due to its excessive pipeline length.
OTOH, for specialized things such as video decoding/encoding, the libraries tend to do run-time CPU detection and use different code paths based on what is available. For example, FFmpeg does this (or at least mplayer did), and AFAIK OpenSSL does this for AES, too.
Bottom line: So, even if I'm a Gentoo user, I wouldn't worry too much about march=generic.
>If you really want to prove that the benchmark is crap, then by all means make meaningful suggestions to _any_ of the existing machine benchmarks.
That's a bit facetious. If you've been around the benchmarking world as long as you say you have, you'll know that the compiler settings are *always* a cause of controversy.
Nobody is happy when compiler settings are made that don't favor their side (whatever it is).
To say a ARM beats a Atom or Celeron is not saying that much. Even coming close to a core i3 is just saying it can hold its own with basic chips.
These days if you want a enjoyable notebook,tablet or desktop experience. You have to go core i5 or above. Especially if your talking low powered chips.
Nothing against ARM in small screen tablets, readers, or specific task OS like a game console. Even a Chromebook sucks more with a ARM chip then a Intel one.
Gentoo users are funny. Doing cpu benchmarks... No recent CPUs...
on this page is a iframe (http://www.phoronix.com/scan.php?page=article&item=nvidia-tegra-x1&num=2)
to https://openbenchmarking.org/e...
No mention of Bitcoin, 3D printing, Arduino or Raspberry Pi.
Lame.
Get free satoshi (Bitcoin) and Dogecoins
The problem being they didn't do the same to ARM. Either that argument applies to both sides or neither. They need to be held to same standard.
XML is like violence. If it doesn't solve the problem, use more.
Well, these just happened to be on my Gentoo boxes, and therefore, of interest.
NVidia should have spent more money on engineering and less on advertising. All the Tegra chip sets overpromised and underdelivered. I see no reason why this one should be different.
It was the stock compiler on each OS.... Thus it's the upstream distro not doing that.