NVIDIA Challenges Apple's iPad Benchmarks
MojoKid writes "At the iPad unveiling last week, Apple flashed up a slide claiming that the iPad 2 was 2x as fast as Nvidia's Tegra 3, while the new iPad would be 4x more powerful than Team Green's best tablet. NVIDIA's response boils down to: 'it's flattering to be compared to you, but how about a little data on which tests you ran and how you crunched the numbers?' NVIDIA is right to call Apple out on the meaningless nature of such a comparison, and the company is likely feeling a bit dogpiled given that TI was waving unverified webpage benchmarks around less than two weeks ago. That said, the Imagination Technologies (PowerVR) GPUs built into the iPad 2 and the new iPad both utilize tile-based rendering. In some ways, 2012 is a repeat of 2001 — memory bandwidth is at an absolute premium because adding more bandwidth has a direct impact on power consumption. The GPU inside NVIDIA's Tegra 2 and Tegra 3 is a traditional chip, which means it's subject to significant overdraw, especially at higher resolutions. Apple's comparisons may be bogus, but Tegra 3's bandwidth issue they indirectly point to aren't. It will be interesting to see NVIDIA's next move and what their rumored Tegra 3+ chip might bring."
The irony in this is that this is coming from a company that presented chunks of wood as their next-gen graphics cards.
Does using the tablet have smooth and instant responsiveness? At the end of the day, that's all that matters. Tegra 100 or ipad 100 won't matter if the OS that uses it isn't smooth and keeps up with the user interactions. Consumers just care about experience, how they get there isn't of interest to anyone other than nerds.
It's not like there aren't trade-offs and downsides to using tile-based. In the end, tile-based GPUs will be a footnote in history.
Want a look at what the A5X can do? Look at some PSVita games. Same GPU. You can even render at a lower resolution like 1024x768 and put that on the screen full-size.
We have no data to show that Apple didn't further bump up the memory bus size (they doubled it from A4 to A5).
I didn't know the PowerVR chips were still around. I had one of the early video cards based on the technology for my PC years ago. It worked ok, but that was long before things like shaders were an issue.
Still, we are talking about a portable device, so I'd think battery life would be more important than having the latest whizz-bang shaders. And just look at all the grief people hare having with the Android lineup due to shader differences between vendors.
Thank God I focus on business programming, not video games. I've yet to hear of ANY tablet or smartphone having problems displaying graphs and charts.
I do not fail; I succeed at finding out what does not work.
Bought a Galaxy Tab for the Tegra 2, was so utterly disappointed. The real world performance was atrocious even compared to devices it was officially benchmarked better against. Sold it within 3 months. Still waiting on a great Android tablet....
Just ask Intel about Apple's benchmarking strategy: For years, the finest in graphic design publicly asserted that PPC was so bitchin' that it was pretty much just letting Intel and x86 live because killing your inferiors is bad taste. Then, one design win, and x86 is suddenly eleventy-billion percent faster than that old-and-busted PPC legacy crap.
Or ask Amazon: Amazon releases 'Kindle' e-reader device. His Steveness declares "Nobody reads". And now Apple is pushing books, newspapers, and their own pet proprietary publishing platform...
Cheer up, emo Nvidia, all you have to do is sell Apple a Tegra N SoC, or even just the rights to include your GPU in their AN SoC, and Tim Cook will personally explain to the world that PowerVR GPUs are slow, weak, make you 30% less creative and are produced entirely from conflict minerals.
Given any two devices X and Y, X is significantly faster than Y.
This confuses many people because in general usage of the word "faster" two different devices can't both be faster than the other. But it's the accepted industry standard.
The last TBDR vs. rasterizer wars were before the rasterizers added aggressive depth compression and hierarchical Z buffering solutions, which eliminated many of the advantages of the TBDR architecture, especially as triangle rates have risen (which have additional costs on a TBDR).
If TBDR was always a huge advantage, one of nvidia or ATI would surely have gone that way - why ignore a 'better' technology if it really is better?
It's just 'different' - under different scenes the two have somewhat different tradeoffs.
I like my iPad1, though it's sluggish. I am (too) anxiously awaiting two new iPads due this Friday. I even kept the running commentary on the announcement up in a browser window (yes, I felt a bit dirty afterward). When I heard the proclamation of the speed difference, that certainly seemed to imply a 4-core processing using. At least, that was in the realm of possibility (4 CPU cores and 4 GPU cores vs the Tegra). I'm not convinced now that the claim is valid except for very special conditions with a host of caveats (using 2 CPU + 4 GPU to calculate GPU-assisted functions vs the 4 core Tegra CPU alone).
I agree 100% with your sentiment - and the responsiveness of the UI makes up for a lot of computational shortcomings in iOS devices. In fact, because the devices aren't meant for computationally intensive processes (protein folding, CFD/FEM analysis, bulk media recoding, etc.) the speed of the processor only needs to be fast enough not to be a hindrance to the use flow. Almost all of the media processing is so limited in format on iOS devices that encode/decode can be HW accelerated, precluding the need to do the killer ops in software. So, it may not matter how fast the A5X is, as long as it is "fast enough." Anything faster than real time won't matter to the user as long as it's real time ALL the time. But you can't just go make up numbers.
Is it just my observation, or are there way too many stupid people in the world?
Don't get me wrong, I love gaming on my iPad (or at least I like it enough to have no desire to get a PS Vita), but there are few games that truly push the GPU because there is just no money in it to do so. Until people are willing to pay $30-40 for a top-notch game on their mobile device, we won't.
and before someone says that touchscreens are another factor, please, that's only a problem with ports (or developers who think touchscreen games are just like console or handheld games without thinking (*cough*EA sports*cough*). Fighting games that require you to hit a bunch of virtual buttons are wretched on a touch screen device. fighting games like Infinity Blade are pretty fun because they take advantage of the touch screen, rather than treat the screen like a virtual controller. I actually did like GTA III, but I often had to find alternative ways to complete missions because running and gunning was more difficult than using the sniper rifle.
The Gish Bar Times - Blog covering Jupiter's moon Io
Nvidia is stupid enough to take the bait. Good job.
the old ipad 2 is faster than the tegra 3, according to arstecnica, so it should make sense that the new ipad is even faster. i can't find the link but i saw it a few days ago, maybe here in a comment
Open Source Java Web Forum with LDAP authentication
We recently saw a graphics benchmark of the A5 vs the Tegra3 posted to /., and the A5 beat the Tegra in real-world-ish benchmarks, and more than doubled it's score in fill rate.
The A5X is basically just the A5 with twice as many GPU cores, and graphics problems tend to be embarrassingly parallel, so unless it scales up really poorly with those extra cores (due to shared bandwidth limitations, or poor geometry scaling) it should have no problem beating the Tegra 3 by 2x, especially in terms of fill rate.
And when you quadruple the number of pixels on your screen, as Apple just did, which measurement matters? Fill rate.
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
Considering that these graphics benchmarks from Anandtech show the iPad 2 GPU handily beating a Tegra 3, it doesn't seem like much of a stretch that the iPad 3 GPU should beat it further.
Having back-in-the-day written a fair bit of code that ran on both PPC and Intel x86, including a bit of assembly for both, I'd agree that Apple's comparisons were more a work of marketing than engineering but PPC legitimately had its moments. Apple used phrases like "up to twice as fast" and there were certainly cases where this was true, however these tended to be very specialized situations where the underlying algorithm played to the natural strengths of the PPC architecture. Such case do not represent the more general code and common algorithms. In general my recollection of those days is that PPC had about a 25% performance advantage over x86. However this advantage was nullified by Intel's ability to reach much higher clock rates.
Overall, as a Mac game developer, it took a bit of effort to get Mac ports on a par with their PC counterparts. One caveat here, emphasize "port" - that the games tended to have been written with only x86 in mind. Contrary to popular belief it is entirely possibly to write code in high level languages that favor one architecture over the other, CISC or RISC, etc. So the x86 side may have had an advantage in that the code was naturally written to favor that architecture. However a counterpoint would be that we did profile extensively and re-write perfectly working original code where we thought we could leverage the PPC architecture. This included dropping down to assembly when compilers could not leverage the architecture properly. Still, this only achieved parity.
Again, note this was back-in-the-day, games that were not using a GPU. So it was more of a CPU v CPU comparison.
Apple doesn't play too lose with marketing statistics? You simply are forgetting the late PowerPC times where a water-cooled Apple system was slower than an air cooled Intel PC.
That is a bogus point. Those water cooled G5s were the standard shipping system. Its entire fair to compare a stock Mac against a stock PC.
The real "engineering" of the PPC vs x86 comparison was through the benchmarking utility. IIRC Apple used a very old version of ByteMarks that was compiled/optimized for the 486 even though they were running on a Pentium at the time. When ByteMarks was recompiled to optimize for the Pentium the PPC advantage faded.
tegra smegma a5x tri-dual-octo-quad core ACME RX3200 Rocket Skates GigaHertzMegaPixelPerSecond my asshole graphics is irrelevant.
the ONLY thing that matters is how it works when its in your hands.
does it drive 2048x1536 at least as well as the ipad 2? yes or no.
the way i see it, neither NVIDIA or Apple can say anything about relative performance because there is nothing using tegra at that resolution.. you can benchmark/extrapolate all you want, but all that matters is real world.
the "quad core A5X GPU" damn well better be faster beause it's driving 4x as many pixels.
The Vita has a far smaller screen with a fraction of the pixels, that skews it even further. Then again, the Vita has to process more inputs.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I wonder how well they both fair with heavy use of alpha blending. I know this will cause big problems for the tile based PowerVR chips.
-]Phreak Out[-
apple said its mac pro was 2.5x faster than its last g5. lets do the math....4x2.5GHz (last g5) =10 total GHz, next mac pro was 8x3.0GHz , 24GHz...funny, almost a linear scaling with MHz. Yeah intel was so much faster. They switched because of portables. Couldn't jam a water-cooled g5 into portable.
This was totally misleading, for any informed definition of misleading.
Just as there are embarrassingly parallel algorithms, there are embarrassingly wide instruction mixes. In the P6 architecture there were a three uop/cycle retirement gate, with a fat queue in front. If your instruction mix had any kind of stall (dependency chain, memory access, branch mispredict) the retirement usually caught up before the queue was filled. In the rare case (Steve Jobs' favorite Photoshop filter) where the instruction mix could sustain a retirement rate of 4 instructions per cycle, x86 showed badly against PPC. Conversely, on bumpy instruction streams full of execution hazards, x86 compared favourably since it had superior OOO head-room.
CoreDuo rebalanced the architecture primarily by adding a fair amount of micro-op fusing, so that one retirement slot effectively retired two instructions (without increasing the amount of retirement dependency checking in that pipeline stage). In some ways, the maligned x86 architecture starts to shine when your implementation adds the fancy trick of micro-op fusion, since the RMW addressing mode is fused at the instruction level. In RISC these instructions are split up into separate read and write portions. That was an asset at many lithographic nodes. But not at the CoreDuo node, as history recounts. Now x86 has caught up on the retirement side, and PPC is panting for breath on the fetch stream (juggling two instructions where x86 encodes only one).
The multitasking agility of x86 was also heavily and happily used. It happens not to show up in pure Photoshop kernels. Admittedly, SSE was pretty pathetic in the early incarnations. Intel decided to add it to the instruction set, but implemented it double pumped (two dispatch cycles per SSE operation). Of course they knew that future devices would double the dispatch width, so this was a way to crack the chicken and egg problem. Yeah, it was an ugly slow iterative process.
The advantage of PPC was never better than horses for courses, and PPC was picky about the courses. It really liked a groomed track.
x86 hardly gave a damn about a groomed track. It had deep OOO resources all the way through the cache hierarchy to main memory and back. The P6 was the generation where how you handled erratic memory latency mattered for important workloads (ever heard of a server?) than the political correctness of your instruction encoding.
Apple never faltered in waving around groomed track benchmark numbers as if the average Mac user sat around and ran Photoshop blur filters 24 by 7. That was Apple's idea of a server workload.
mov eax, [esi]
inc eax
mov [esi], eax
That's a RISC program in x86 notation. Whether the first and second use of [esi] amounts to the same memory location as any other memory access that OOO might interleave is a big problem. That's a lot of hazard detection to do to maintain four-wide retirement.
Here is a CISC program in x86 notation. I can't show it to you in PPC notation, since PPC is a proper subset minus this feature.
inc [esi]
Clearly, with a clever implementation, you can arrange that the hazard check against potentially interleaved accesses to memory is performed once, not twice. It takes a lot of transistors to reach the blissful state of clever implementation. That's precisely the story of CoreDuo. It finally hit the bliss threshold (helped greatly that the Prescott people and their marketing overlords were busy walking the green plank).
Did Apple tell any of this story in vaguely the same way? Nooooo. It waved around one embarrassingly wide instruction stream that appealed to cool people until it turned blue in the face.
Cure for the blue face: make an about face.
Do I trust this new iPad 3 benchmark? Hahahahahaha. You know, I've never let out my inner six year old in 5000 posts, but it feels good.
http://www.youtube.com/watch?v=SvvcQpp3SYE&feature=youtube_gdata_player
So NVIDIA wants documentation about how Apple's hardware works? Funny, that.
Why not say the ipad is 100x faster than X? Apple stated during the ipad release that the iphone has the highest pixel density, but the HTC rezound is higher. http://en.wikipedia.org/wiki/List_of_displays_by_pixel_density
When I heard the proclamation of the speed difference, that certainly seemed to imply a 4-core processing using. At least, that was in the realm of possibility (4 CPU cores and 4 GPU cores vs the Tegra). I'm not convinced now that the claim is valid except for very special conditions with a host of caveats (using 2 CPU + 4 GPU to calculate GPU-assisted functions vs the 4 core Tegra CPU alone).
Two comments: First, please, please try to write more grammatical sentences. It's hard to parse out what you're trying to say. Second, during Apple's presentation, you seem to have missed a rather prominent bit of the slide with the benchmark results. You know, the label which said "Graphics", in large, easy-to-read letters. They never made a more general claim about CPU or system performance. That happened only in your head, because you weren't really paying attention to what they were saying!
What NVIDIA really wants is a sample of the benchmark so that they can tweak their drivers to fool the benchmark into producing a higher number; same as they do for the rest of the benchmarks.