AMD Catalyst Driver To Enable Mantle, Fix Frame Pacing, Support HSA For Kaveri
MojoKid writes "AMD has a new set of drivers coming in a couple of days that are poised to resolve a number of longstanding issues and enable a handful of new features as well, most notably support for Mantle. AMD's new Catalyst 14.1 beta driver is going to be the first publicly available driver from AMD that will support Mantle, AMD's "close to the metal" API that will let developers wring additional performance from GCN-based GPUs. However, the new drivers will also add support for the HSA-related features introduced with the recently released Kaveri APU, and will reportedly fix the frame pacing issues associated with Radeon HD 7000 series CrossFire configurations. A patch for Battlefield 4 is due to arrive soon as well and AMD is claiming performance gains in excess of 40 percent in CPU limited scenarios but smaller gains in GPU-limited conditions, with average gains of 11 — 13 percent over all."
First time accepted submitter Spottywot adds some details about the Battlefield 4 improvements, writing that Johan Andersson, one of the Technical Directors in the Frostbite team, says that the best performance gains are observed when a game is bottlenecked by the CPU, "which can be quite common even on high-end machines." "With an AMD A10-7850K 'Kaveri' APU Mantle provides a 14 per cent improvement, on a system with an AMD FX-8350 and Radeon 7970 Mantle provides a 25 per cent boost, while on an Intel Core i7-3970x Extreme system with 2x AMD Radeon R9 290x cards a huge 58 per cent performance increase was observed."
Some games which were really slow on my system like Star Wars the old republic have improved in later patches as they now spread the tasks across all 6 cpu cores.
However there is lag sometimes when the cpu usage is at only 40%. This is because of synchronization between all the cores waiting on the other to finish something etc. That is one of the drawbacks of parallelization and why Intel's Itanium failed. Great for servers but anything where data needs to be exchanged between the different parts of the program via threads hits bottlenecks.
So the icore7 uses crazy mathematical algorithms to execute data before it even arrives to save bandwidth to get insane IPC which is why AMD can't compete. But if you have a heavily threaded app that is latency intensive like a game it can be choppy even with low cpu utilization.
There is so much wrong with this post.
First, Itanium didn't fail due to difficulties in paralleling things. Software never ramped up due to low market penetration and the fact that they shoved instruction execution back onto the compiler writers, it had poor perfromance for X86 code, and it was never targetted at anything but big server iron. It was never intended to a consumer level chip. Another big reason Itanium failed was the introduction of AMD 64.
Secondly the anecdotal latency that you experience in SWToR even though CPU utilization only being 40% is unlikely due to "core waiting on the other to finish something" and I challenge you to present a heap dump illustrating such a block correlated with your metric for latency. If you have not done an in depth analysis you'd have no way to know, but if you did I'd be curious as to your findings.
Finally, I have no idea why you would think that the i7 (I assume that's what you meant by icore7) "execute[s] data before it arrives." That doesn't even make sense. What you are most like referring to is out-of-order execution, or possibly branch prediction - both features that are also present in AMD chips and earlier Intel chips going back to the Pentium Pro. The better IPC of the i7 certainly has nothing to do with magical future seeing math and more to do with better execution units, OoO executions resources and superior FPU hardware.
It is true that in general games have no been able to scale to use 100% of your cpu 100% of the time, but it's not for the reason that you have stated and I'm quite doubtful that threading has introduced the type of latency a human would notice in the equation as you describe. There is a latency/throughput trade off, but ti is quite possible to achieve superior frame latencies with multiple cores than with single cores.
It was never intended to a consumer level chip.
I take it you weren't around at the time? I remember many magazine articles about how Itanium was going to replace x86 everywhere, before it turned out to suck so bad at running x86 code.