AMD Catalyst Driver To Enable Mantle, Fix Frame Pacing, Support HSA For Kaveri

← Back to Stories (view on slashdot.org)

AMD Catalyst Driver To Enable Mantle, Fix Frame Pacing, Support HSA For Kaveri

Posted by timothy on Thursday January 30, 2014 @05:22AM from the next-step-is-the-optic-nerve dept.

MojoKid writes "AMD has a new set of drivers coming in a couple of days that are poised to resolve a number of longstanding issues and enable a handful of new features as well, most notably support for Mantle. AMD's new Catalyst 14.1 beta driver is going to be the first publicly available driver from AMD that will support Mantle, AMD's "close to the metal" API that will let developers wring additional performance from GCN-based GPUs. However, the new drivers will also add support for the HSA-related features introduced with the recently released Kaveri APU, and will reportedly fix the frame pacing issues associated with Radeon HD 7000 series CrossFire configurations. A patch for Battlefield 4 is due to arrive soon as well and AMD is claiming performance gains in excess of 40 percent in CPU limited scenarios but smaller gains in GPU-limited conditions, with average gains of 11 — 13 percent over all." First time accepted submitter Spottywot adds some details about the Battlefield 4 improvements, writing that Johan Andersson, one of the Technical Directors in the Frostbite team, says that the best performance gains are observed when a game is bottlenecked by the CPU, "which can be quite common even on high-end machines." "With an AMD A10-7850K 'Kaveri' APU Mantle provides a 14 per cent improvement, on a system with an AMD FX-8350 and Radeon 7970 Mantle provides a 25 per cent boost, while on an Intel Core i7-3970x Extreme system with 2x AMD Radeon R9 290x cards a huge 58 per cent performance increase was observed."

16 of 71 comments (clear)

Min score:

Reason:

Sort:

High end cpu's get little to no boost by Billly+Gates · 2014-01-30 05:29 · Score: 3, Interesting

MaximumPC paints this a little bit different. Where only lower end cpu's get a big boost in conjecture with higher end AMD cards.
I guess we will wait and see with benchmarks later today when 14.1 is released.
This is great news for those like me on older Phenom II 2.6 ghz systems who can afford to upgrade the ram, video card, and to an ssd but not the cpu without a whole damn new system. I use VMWare and this obsolete system has a 6 core cpu and hardware virtualization support. Otherwise I would upgrade but only an icore7 or higher end AMD FX-8350s have the same features for non gaming tasks. I can play Battlefiend 4 on this soon with high settings at 1080p would be great!

--
http://saveie6.com/
1. Re:High end cpu's get little to no boost by edxwelch · 2014-01-30 05:55 · Score: 2
  
  So, MaximumPC must not consider the i7-3970x Extreme mentioned a high end CPU. Because that gets a 58% boost.
2. Re:High end cpu's get little to no boost by Anonymous Coward · 2014-01-30 06:03 · Score: 5, Informative
  
  Some games which were really slow on my system like Star Wars the old republic have improved in later patches as they now spread the tasks across all 6 cpu cores.
  However there is lag sometimes when the cpu usage is at only 40%. This is because of synchronization between all the cores waiting on the other to finish something etc. That is one of the drawbacks of parallelization and why Intel's Itanium failed. Great for servers but anything where data needs to be exchanged between the different parts of the program via threads hits bottlenecks.
  So the icore7 uses crazy mathematical algorithms to execute data before it even arrives to save bandwidth to get insane IPC which is why AMD can't compete. But if you have a heavily threaded app that is latency intensive like a game it can be choppy even with low cpu utilization.
  There is so much wrong with this post.
  First, Itanium didn't fail due to difficulties in paralleling things. Software never ramped up due to low market penetration and the fact that they shoved instruction execution back onto the compiler writers, it had poor perfromance for X86 code, and it was never targetted at anything but big server iron. It was never intended to a consumer level chip. Another big reason Itanium failed was the introduction of AMD 64.
  Secondly the anecdotal latency that you experience in SWToR even though CPU utilization only being 40% is unlikely due to "core waiting on the other to finish something" and I challenge you to present a heap dump illustrating such a block correlated with your metric for latency. If you have not done an in depth analysis you'd have no way to know, but if you did I'd be curious as to your findings.
  Finally, I have no idea why you would think that the i7 (I assume that's what you meant by icore7) "execute[s] data before it arrives." That doesn't even make sense. What you are most like referring to is out-of-order execution, or possibly branch prediction - both features that are also present in AMD chips and earlier Intel chips going back to the Pentium Pro. The better IPC of the i7 certainly has nothing to do with magical future seeing math and more to do with better execution units, OoO executions resources and superior FPU hardware.
  It is true that in general games have no been able to scale to use 100% of your cpu 100% of the time, but it's not for the reason that you have stated and I'm quite doubtful that threading has introduced the type of latency a human would notice in the equation as you describe. There is a latency/throughput trade off, but ti is quite possible to achieve superior frame latencies with multiple cores than with single cores.
3. Re:High end cpu's get little to no boost by 0123456 · 2014-01-30 06:05 · Score: 3, Informative
  
  It was never intended to a consumer level chip.
  I take it you weren't around at the time? I remember many magazine articles about how Itanium was going to replace x86 everywhere, before it turned out to suck so bad at running x86 code.
4. Re:High end cpu's get little to no boost by Anonymous Coward · 2014-01-30 06:17 · Score: 2, Informative
  
  I was around at the time and before. Itanium was touted as eventually replacing X86 years off in the hype phase, but when it was only ever released as as a server chip (with an order of magnitude more transistors and cost than any consumer chip) and any notion that it would ever be for end users was dropped. I'm not sure if the early hype had anything to do with Intel's intentions or simply everyone else's day dreams, but Itanium was designed specifically to compete with RISC on big iron. Intel was not even considering 64-bit consumer chips at that time.
  While they may have had notions that decades from then Consumers would run Itanium after a bunch of process node shrinks, that has more to do with the EPIC ISA and a lot less to do with the chip Itanium.
  Since the x86 performance was basically a P4 built alongside the EPIC core, they certainly could have devoted more transistor budget to it if that was critical for them, but EPIC was designed to replace x86 ultimately, not be a high speed x86 execution engine.
  Don't get me wrong, Itanium was a disaster and I'm not defending Intel on it. If it wasn't for AMD we wouldn't have seen an update to 32-bit x86 or gotten consumer level 64-bit for a long, long time.... But no version of Itanium or Itanium 2 was targeted for you to in a Dell.
Idle is not the only measure... by Junta · 2014-01-30 05:47 · Score: 2

We are talking about a real time application, so even without 100% load over a relatively large sampling interval, performance can be degraded.
Let's assume that you have 2 sequential things that cannot be overlapped. CPU setup and GPU processing. You cannot begin CPU setup of next frame until GPU is done with current frame (gross oversimplification, but there are sequencies that bear some resemblence to this).
So a hypothetical CPU takes 1 ms to setup a frame, and then the hypothetical GPU then takes 4 ms to render it. There's 20% usage CPU wise reported on average over as small a sampling interval as 5 ms, and a throughput of 200 frames per second.
Let's say that the CPU now takes 10 times less time, and takes 100 us to setup a frame, and then the GPU still takes 4 ms to render it. You have gotten about a 20% speedup. Even though the CPU was not fully utililized, it did represent a throttling factor in FPS.

--
XML is like violence. If it doesn't solve the problem, use more.
AMD strategic... by Junta · 2014-01-30 05:56 · Score: 3, Interesting

So gamers get a small boost to their gaming rigs, but that's not *really* the goal for AMD.
The real goal is that AMD demonstrably lags Intel in *CPU* performance, but not GPU. OpenGL/Direct3D implementations cause that to matter, meaning AMD's cpu business gets dinged as a valid component in a configuration that will do some gaming. Mantel diminishes the importance of the CPU to most gaming, therefore their weak CPU offering is made workable to sell their APU based systems. It can do so cheaper than Intel+Discrete GPU while still reaping a tidy profit.

--
XML is like violence. If it doesn't solve the problem, use more.
Direct3D "light weight" runtime by edxwelch · 2014-01-30 06:02 · Score: 2

On a related note, Microsoft are working on an update to Direct3D to provide a "light weight" runtime similiar to the XBone. Presumably, this will solve the same draw call issues that Mantle deals with.
Unfortunately, it doesn't sound like the update will happen anytime soon - maybe for Windows 9?
Also, it's unclear whether they will back port the update to Windows 7.
https://blogs.windows.com/wind...
Re:Sounds like a bit of a bust. by Anonymous Coward · 2014-01-30 06:40 · Score: 5, Insightful

I think that misses the point. CPUs aren't the limiting factor in part because game devs limit the number of draw calls they issue to avoid it being a limiting factor (because not everybody has a high end CPU). Mantle may not offer vastly more performance in the short term, but it will enable more in game engines in the long term if the claims DICE and AMD make are accurate. That doesn't get away from the cost of lock in, but like any new release of this sort Mantle may never catch on but it may push DX and GL to change in a mantle-like direction which does then benefit all developers.
Re:R9 290X vs 650 Ti Boost by Mashiki · 2014-01-30 07:19 · Score: 2

NVidia driver quality vs AMD driver quality...
Chances are in the GP's case there's something else going on. Since I've got a 7950 and a FX-6300, and can run skyrim on the max settings while getting no frame hitching. Then again it could just be the 290, since there were TDP issues on some batch runs from what I've heard.
But driver quality? Nvidia's drivers are the reason I went to AMD, after nearly a year of them blaming the end user for constant TDR crashes, then deciding to man up and pay to have rigs in the US shipped to California for TDR testing, then releasing a driver which mostly fixed the TDR issues--where they were very quiet on revealing why it was crashing(all they said was "we fixed it in most cases"). Though a few intrepid people found it had to do with the drivers dropping the core and ram voltages so low that the cards became unresponsive and unstable. Then there was the 5-7 months where the 290-3xx series drivers were causing hardlocks across the board for 400,500,600 series owners.
I do remember when ATI's drivers were shit, I owned a couple of radeon cards during that time. But since AMD bought them out, their driver quality has been increasing quite a bit.

--
Om, nomnomnom...
Re:R9 290X vs 650 Ti Boost by TheRealMindChild · 2014-01-30 07:28 · Score: 2

Hell, I'm stuck on Nvidias 314.22 drivers because every driver from the 32x and 33x series causes my machine to freeze or restart the driver in a "safe mode". You can read the many links to this horror via https://www.google.com/search?q=nvidia+video5

--

"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Re:Well by Pino+Grigio · 2014-01-30 07:53 · Score: 2

It's not just that, it's the locking behaviour needed for correct concurrency. There's no point in multi-threading a lot of cores that are going to spend most of their time waiting on another thread to release a lock. Even Mantle is single threaded. i.e. There's a queue for the GPU, one for Compute and one for DMA, on different threads. There aren't going to be 8 threads all using the GPU at once. It'll still be serialised by the driver, just more efficiently than you can do it with existing D3D or GL drivers.
Here is a more important question by DarkAce911 · 2014-01-30 08:21 · Score: 2

Does it increase my mining hash speed? If you are buying AMD cards for gaming at today's prices, you are an idiot.
Seems like it'll screw them in the long run by Sycraft-fu · 2014-01-30 08:38 · Score: 3

Well, assuming it takes off which I don't think it will. If this stuff is truly "close to the core" as the Mantle name and marketing hype claim, then it'll only work so long as they stick with the CGN architecture. It won't work with any large architecture changes. So that means that they either have to stick with GCN forever, which would probably cripple their ability to make competitive cards in the future as things change, or they'd have to abandon support for Mantle in newer cards, which wouldn't be that popular with the developers and users that had bought in. I suppose they also could provide some kind of abstraction/emulation layer but that rather defeats the purpose of a "bare metal" kind of API.
I just can't see this as being a good thing for AMD in the long run, presuming Mantle truly is what they claim. The whole reason for things like DirectX and OpenGL are to abstract the hardware so that you don't have to write a render for each and every kind of card architecture, which does get changed a lot. If Mantle is tightly tied to GCN then that screws all that over.
So either this is a rather bad desperation move from AMD to try and make up for the fact that their CPUs have been sucking lately, or this is a bunch of marketing BS and really Mantle is a high level API, but just a proprietary one to try and screw over nVidia.
1. Re:Seems like it'll screw them in the long run by higuita · 2014-01-30 23:55 · Score: 2
  
  Don't forget that all new consoles today are using AMD cards, so the game developers are using mantle for those, they would also can do that to PC with little change
  
  --
  Higuita
Re:Well by Bengie · 2014-01-30 09:56 · Score: 2

Even Mantle is single threaded
Single threaded per queue and you can create as many as you want per application. During setup, you create all of the queues you want to use, then register them. Once registered, the GPU can directly read from the queue instead of making a system call. This pretty much removes those pesky 30,000 clock cycle system calls. Kaveri goes a step further and registers queues directly with the hardware, because each queue item is exactly one cache line in size and the APU shares the cache-coherency and protected memory, the latency between the CPU and GPU is about the latency of the cache, which is around 10ns.

Because prior to Mantle, system calls were used, a 10ns communications latency was completely dwarfed by the 30,000 cycle system call. Now we're getting along the lines of 0 call overhead and 10ns latency. Obviously the GPU must have free resources to start working and some code may need to be ran in order for the GPU to schedule, but the main point still remains.