NVIDIA Challenges Apple's iPad Benchmarks
MojoKid writes "At the iPad unveiling last week, Apple flashed up a slide claiming that the iPad 2 was 2x as fast as Nvidia's Tegra 3, while the new iPad would be 4x more powerful than Team Green's best tablet. NVIDIA's response boils down to: 'it's flattering to be compared to you, but how about a little data on which tests you ran and how you crunched the numbers?' NVIDIA is right to call Apple out on the meaningless nature of such a comparison, and the company is likely feeling a bit dogpiled given that TI was waving unverified webpage benchmarks around less than two weeks ago. That said, the Imagination Technologies (PowerVR) GPUs built into the iPad 2 and the new iPad both utilize tile-based rendering. In some ways, 2012 is a repeat of 2001 — memory bandwidth is at an absolute premium because adding more bandwidth has a direct impact on power consumption. The GPU inside NVIDIA's Tegra 2 and Tegra 3 is a traditional chip, which means it's subject to significant overdraw, especially at higher resolutions. Apple's comparisons may be bogus, but Tegra 3's bandwidth issue they indirectly point to aren't. It will be interesting to see NVIDIA's next move and what their rumored Tegra 3+ chip might bring."
The irony in this is that this is coming from a company that presented chunks of wood as their next-gen graphics cards.
Does using the tablet have smooth and instant responsiveness? At the end of the day, that's all that matters. Tegra 100 or ipad 100 won't matter if the OS that uses it isn't smooth and keeps up with the user interactions. Consumers just care about experience, how they get there isn't of interest to anyone other than nerds.
I didn't know the PowerVR chips were still around. I had one of the early video cards based on the technology for my PC years ago. It worked ok, but that was long before things like shaders were an issue.
Still, we are talking about a portable device, so I'd think battery life would be more important than having the latest whizz-bang shaders. And just look at all the grief people hare having with the Android lineup due to shader differences between vendors.
Thank God I focus on business programming, not video games. I've yet to hear of ANY tablet or smartphone having problems displaying graphs and charts.
I do not fail; I succeed at finding out what does not work.
Bought a Galaxy Tab for the Tegra 2, was so utterly disappointed. The real world performance was atrocious even compared to devices it was officially benchmarked better against. Sold it within 3 months. Still waiting on a great Android tablet....
Just ask Intel about Apple's benchmarking strategy: For years, the finest in graphic design publicly asserted that PPC was so bitchin' that it was pretty much just letting Intel and x86 live because killing your inferiors is bad taste. Then, one design win, and x86 is suddenly eleventy-billion percent faster than that old-and-busted PPC legacy crap.
Or ask Amazon: Amazon releases 'Kindle' e-reader device. His Steveness declares "Nobody reads". And now Apple is pushing books, newspapers, and their own pet proprietary publishing platform...
Cheer up, emo Nvidia, all you have to do is sell Apple a Tegra N SoC, or even just the rights to include your GPU in their AN SoC, and Tim Cook will personally explain to the world that PowerVR GPUs are slow, weak, make you 30% less creative and are produced entirely from conflict minerals.
I like my iPad1, though it's sluggish. I am (too) anxiously awaiting two new iPads due this Friday. I even kept the running commentary on the announcement up in a browser window (yes, I felt a bit dirty afterward). When I heard the proclamation of the speed difference, that certainly seemed to imply a 4-core processing using. At least, that was in the realm of possibility (4 CPU cores and 4 GPU cores vs the Tegra). I'm not convinced now that the claim is valid except for very special conditions with a host of caveats (using 2 CPU + 4 GPU to calculate GPU-assisted functions vs the 4 core Tegra CPU alone).
I agree 100% with your sentiment - and the responsiveness of the UI makes up for a lot of computational shortcomings in iOS devices. In fact, because the devices aren't meant for computationally intensive processes (protein folding, CFD/FEM analysis, bulk media recoding, etc.) the speed of the processor only needs to be fast enough not to be a hindrance to the use flow. Almost all of the media processing is so limited in format on iOS devices that encode/decode can be HW accelerated, precluding the need to do the killer ops in software. So, it may not matter how fast the A5X is, as long as it is "fast enough." Anything faster than real time won't matter to the user as long as it's real time ALL the time. But you can't just go make up numbers.
Is it just my observation, or are there way too many stupid people in the world?
Don't get me wrong, I love gaming on my iPad (or at least I like it enough to have no desire to get a PS Vita), but there are few games that truly push the GPU because there is just no money in it to do so. Until people are willing to pay $30-40 for a top-notch game on their mobile device, we won't.
and before someone says that touchscreens are another factor, please, that's only a problem with ports (or developers who think touchscreen games are just like console or handheld games without thinking (*cough*EA sports*cough*). Fighting games that require you to hit a bunch of virtual buttons are wretched on a touch screen device. fighting games like Infinity Blade are pretty fun because they take advantage of the touch screen, rather than treat the screen like a virtual controller. I actually did like GTA III, but I often had to find alternative ways to complete missions because running and gunning was more difficult than using the sniper rifle.
The Gish Bar Times - Blog covering Jupiter's moon Io
We recently saw a graphics benchmark of the A5 vs the Tegra3 posted to /., and the A5 beat the Tegra in real-world-ish benchmarks, and more than doubled it's score in fill rate.
The A5X is basically just the A5 with twice as many GPU cores, and graphics problems tend to be embarrassingly parallel, so unless it scales up really poorly with those extra cores (due to shared bandwidth limitations, or poor geometry scaling) it should have no problem beating the Tegra 3 by 2x, especially in terms of fill rate.
And when you quadruple the number of pixels on your screen, as Apple just did, which measurement matters? Fill rate.
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
Considering that these graphics benchmarks from Anandtech show the iPad 2 GPU handily beating a Tegra 3, it doesn't seem like much of a stretch that the iPad 3 GPU should beat it further.
Having back-in-the-day written a fair bit of code that ran on both PPC and Intel x86, including a bit of assembly for both, I'd agree that Apple's comparisons were more a work of marketing than engineering but PPC legitimately had its moments. Apple used phrases like "up to twice as fast" and there were certainly cases where this was true, however these tended to be very specialized situations where the underlying algorithm played to the natural strengths of the PPC architecture. Such case do not represent the more general code and common algorithms. In general my recollection of those days is that PPC had about a 25% performance advantage over x86. However this advantage was nullified by Intel's ability to reach much higher clock rates.
Overall, as a Mac game developer, it took a bit of effort to get Mac ports on a par with their PC counterparts. One caveat here, emphasize "port" - that the games tended to have been written with only x86 in mind. Contrary to popular belief it is entirely possibly to write code in high level languages that favor one architecture over the other, CISC or RISC, etc. So the x86 side may have had an advantage in that the code was naturally written to favor that architecture. However a counterpoint would be that we did profile extensively and re-write perfectly working original code where we thought we could leverage the PPC architecture. This included dropping down to assembly when compilers could not leverage the architecture properly. Still, this only achieved parity.
Again, note this was back-in-the-day, games that were not using a GPU. So it was more of a CPU v CPU comparison.
tegra smegma a5x tri-dual-octo-quad core ACME RX3200 Rocket Skates GigaHertzMegaPixelPerSecond my asshole graphics is irrelevant.
the ONLY thing that matters is how it works when its in your hands.
does it drive 2048x1536 at least as well as the ipad 2? yes or no.
the way i see it, neither NVIDIA or Apple can say anything about relative performance because there is nothing using tegra at that resolution.. you can benchmark/extrapolate all you want, but all that matters is real world.
the "quad core A5X GPU" damn well better be faster beause it's driving 4x as many pixels.
This was totally misleading, for any informed definition of misleading.
Just as there are embarrassingly parallel algorithms, there are embarrassingly wide instruction mixes. In the P6 architecture there were a three uop/cycle retirement gate, with a fat queue in front. If your instruction mix had any kind of stall (dependency chain, memory access, branch mispredict) the retirement usually caught up before the queue was filled. In the rare case (Steve Jobs' favorite Photoshop filter) where the instruction mix could sustain a retirement rate of 4 instructions per cycle, x86 showed badly against PPC. Conversely, on bumpy instruction streams full of execution hazards, x86 compared favourably since it had superior OOO head-room.
CoreDuo rebalanced the architecture primarily by adding a fair amount of micro-op fusing, so that one retirement slot effectively retired two instructions (without increasing the amount of retirement dependency checking in that pipeline stage). In some ways, the maligned x86 architecture starts to shine when your implementation adds the fancy trick of micro-op fusion, since the RMW addressing mode is fused at the instruction level. In RISC these instructions are split up into separate read and write portions. That was an asset at many lithographic nodes. But not at the CoreDuo node, as history recounts. Now x86 has caught up on the retirement side, and PPC is panting for breath on the fetch stream (juggling two instructions where x86 encodes only one).
The multitasking agility of x86 was also heavily and happily used. It happens not to show up in pure Photoshop kernels. Admittedly, SSE was pretty pathetic in the early incarnations. Intel decided to add it to the instruction set, but implemented it double pumped (two dispatch cycles per SSE operation). Of course they knew that future devices would double the dispatch width, so this was a way to crack the chicken and egg problem. Yeah, it was an ugly slow iterative process.
The advantage of PPC was never better than horses for courses, and PPC was picky about the courses. It really liked a groomed track.
x86 hardly gave a damn about a groomed track. It had deep OOO resources all the way through the cache hierarchy to main memory and back. The P6 was the generation where how you handled erratic memory latency mattered for important workloads (ever heard of a server?) than the political correctness of your instruction encoding.
Apple never faltered in waving around groomed track benchmark numbers as if the average Mac user sat around and ran Photoshop blur filters 24 by 7. That was Apple's idea of a server workload.
mov eax, [esi]
inc eax
mov [esi], eax
That's a RISC program in x86 notation. Whether the first and second use of [esi] amounts to the same memory location as any other memory access that OOO might interleave is a big problem. That's a lot of hazard detection to do to maintain four-wide retirement.
Here is a CISC program in x86 notation. I can't show it to you in PPC notation, since PPC is a proper subset minus this feature.
inc [esi]
Clearly, with a clever implementation, you can arrange that the hazard check against potentially interleaved accesses to memory is performed once, not twice. It takes a lot of transistors to reach the blissful state of clever implementation. That's precisely the story of CoreDuo. It finally hit the bliss threshold (helped greatly that the Prescott people and their marketing overlords were busy walking the green plank).
Did Apple tell any of this story in vaguely the same way? Nooooo. It waved around one embarrassingly wide instruction stream that appealed to cool people until it turned blue in the face.
Cure for the blue face: make an about face.
Do I trust this new iPad 3 benchmark? Hahahahahaha. You know, I've never let out my inner six year old in 5000 posts, but it feels good.
That's also what they said in the late 90's when the PowerVR was competing with the 3Dfx Voodoo add-in cards. Given that there have been at least 50 million PowerVR-based GPUs shipped so far that's a heck of a footnote.
^I'm with stupid.^