Slashdot Mirror


AMD Confirms Linux 'Performance Marginality Problem' On Ryzen (phoronix.com)

An anonymous reader writes: Ryzen customers experiencing segmentation faults under Linux when firing off many compilation processes have now had their problem officially acknowledged by AMD. The company describes it as a "performance marginality problem" affecting some Ryzen customers and only on Linux. AMD confirmed Threadripper and Epyc processors are unaffected; they will be dealing with the issue on a customer-by-customer basis, and their future consumer products will see better Linux testing/validation. Ryzen customers believed to be affected by the problem can contact AMD Customer Care. Michael Larabel writes via Phoronix: "With the Ryzen segmentation faults on Linux they are found to occur with many, parallel compilation workloads in particular -- certainly not the workloads most Linux users will be firing off on a frequent basis unless intentionally running scripts like ryzen-test/kill-ryzen. As I've previously written, my Ryzen Linux boxes have been working out great except in cases of intentional torture testing with these heavy parallel compilation tasks. [AMD's] analysis has also found that these Ryzen segmentation faults aren't isolated to a particular motherboard vendor or the like, contrary to rumors/noise online due to the complexity of the problem."

120 comments

  1. Just like FDIV by Anonymous Coward · · Score: 3, Insightful

    Will only affect a few people, so we aren't replacing any CPUs. Way to hand Intel the business, AMD!

    1. Re:Just like FDIV by Anonymous Coward · · Score: 0

      FDIV was a hardware bug that affected all software that ran on it. This is apparently isolated to Linux.

    2. Re: Just like FDIV by Anonymous Coward · · Score: 0

      And all the BSD:s so it's by far not Linux only.

    3. Re:Just like FDIV by ravenshrike · · Score: 1

      Except it doesn't apply to Threadripper, Epyc, or Ryzen Pro. And it doesn't affect all of normal Ryzen either. So the entirety of the market they're handing to Intel is those buying personal systems who run large amounts of parallel compilation workloads and who don't feel like RMAing till they get a chip without the defect.

    4. Re:Just like FDIV by Anonymous Coward · · Score: 0

      Doesn't apply ... still has to be verified.

      So basically, until people have crashes, and discover it's because of their CPU, THEN they can have a working chip ?
      If they don't find a fix that works for everybody, this ship is dead. And as far as I'm concerned they are too.

    5. Re:Just like FDIV by arglebargle_xiv · · Score: 3, Insightful

      Except it doesn't apply to Threadripper, Epyc, or Ryzen Pro.

      We don't even know if it's an AMD problem, it could be any one of a number of previously-unnoticed Linux issues that happen to show up on Ryzen (note that the text says "may also affect other Unix-like operating systems", not "exists under FreeBSD as well", so currently it's pure speculation that it extends past Linux). We'll have to wait and see what further investigation turns up...

    6. Re: Just like FDIV by Anonymous Coward · · Score: 0

      So the twelve people who are affected, they get new hardware. This is a non-story.

    7. Re:Just like FDIV by Gr8Apes · · Score: 1

      Except it doesn't apply to Threadripper, Epyc, or Ryzen Pro.

      We don't even know if it's an AMD problem, it could be any one of a number of previously-unnoticed Linux issues that happen to show up on Ryzen (note that the text says "may also affect other Unix-like operating systems", not "exists under FreeBSD as well", so currently it's pure speculation that it extends past Linux). We'll have to wait and see what further investigation turns up...

      That's more than a little interesting. Wonder if it affects NetBSD? If both FreeBSD and NetBSD are free of this error, I may have my next system.

      --
      The cesspool just got a check and balance.
    8. Re:Just like FDIV by Anonymous Coward · · Score: 0

      That's nonsense. The FDIV bug affected only software that used the FDIV instruction, and only with a certain range of operands.
      Most software wasn't affected at all.
      In the same way, the only software that has been found using the borked codepaths on this chip runs on Linux, but that doesn't make the chips not defective.

    9. Re: Just like FDIV by Anonymous Coward · · Score: 0

      Are you serious? Linux geeks are hardcore geeks? And this is a non story?

      I just got myself Ryzen PC less than a week ago. This is very relevant to me.
      I am on Linux Gentoo, ALL IS COMPILED THERE. All programs/packages. Everything.

      And btw, no, no issues with my Ryzen so far.

      It doesn't affect everyone. I had emerge -jobs 12(compile up to 12 packages) with -J12 gcc, still no crashes...

  2. oblig by Anonymous Coward · · Score: 5, Informative

    certainly not the workloads most Linux users will be firing off on a frequent basis

    I run Gentoo you insensitive clod!

    1. Re: oblig by Anonymous Coward · · Score: 1

      And taking a single step on their own is probably the heaviest thing they do

    2. Re: oblig by GameboyRMH · · Score: 2

      Have you tried watching H.265/HEVC-encoded anime? :-P

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    3. Re:oblig by Misagon · · Score: 4, Insightful

      How was the parent modded as "Funny"?
      This is definitely not funny. Some users of compiled distros such as Gentoo have encountered the bug in fairly regular basis when trying to compile the distro -- which is needed to make it install.

      --
      "We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
    4. Re:oblig by Anonymous Coward · · Score: 0

      Me too, since about week ago, no proglems here with 1600 CPU

  3. MT was what AMD had over Intel by Anonymous Coward · · Score: 1

    Multi-threaded performance was the main advantage that Ryzen had over Intel. Single threaded is still Intel's game and now you are telling that I can't run a make -j all my cores?

    1. Re:MT was what AMD had over Intel by F.Ultra · · Score: 3, Insightful

      Well you can (run make -j ), just be prepared to rerun that if/when it segfaults... For most people so far they only get the segfault if they do "make clean && make -jX" a few times so a single make of even a large project should probably work most of the time. Will be interesting to see if/when AMD will be able to fix it, particular why Windows does not seam to suffer from it yet will be interesting to see.

    2. Re:MT was what AMD had over Intel by Rockoon · · Score: 1

      Maybe someone can put on the table the key differences between Linux and Windows thread scheduling, because surely its in there somewhere.

      --
      "His name was James Damore."
    3. Re: MT was what AMD had over Intel by Anonymous Coward · · Score: 1

      I suspect there are more users compiling under Linux. I run 16 thread parallel compiles daily under Win10 and see a lot of internal compiler errors on my 1700x, errors my fx8370 never displayed on 8 thread builds. There a good chance they simply haven't had any reports for windows yet.

    4. Re: MT was what AMD had over Intel by Anonymous Coward · · Score: 0

      It may be solvable with a microcode update, but in the worst case AMD needs to replace the affected chips with a newer revision.

    5. Re:MT was what AMD had over Intel by OneAhead · · Score: 2

      This. The very existence of that flag in an ubiquitous utility that is commonly run even by end users (of at least some distros ;-)) makes the following sentence in TFA sound quite ignorant at best (and dishonest at worst):
      With the Ryzen segmentation faults on Linux they are found to occur with many, parallel compilation workloads in particular -- certainly not the workloads most Linux users will be firing off on a frequent basis unless intentionally running scripts like ryzen-test/kill-ryzen.

    6. Re:MT was what AMD had over Intel by Anonymous Coward · · Score: 0

      -- quote --
      Well you can (run make -j ), just be prepared to rerun that if/when it segfaults...
      -- unquote --
      Sounds like a microsoft-esque way to downplay an issue :D

    7. Re:MT was what AMD had over Intel by rew · · Score: 2

      Wait!

      What is happening is that the CPU will mis-execute some instruction so that some "data" becomes invalid. When a compiler is running such data is often a pointer and the wrong pointer often results in a segfault.

      But especially while we don't know what's going on exactly, this could also corrupt data. i.e. give the wrong results in a computation, or result in a bad binary when the running program is a compiler.

      So you're suggesting I trust the resulting binaries when the compilation doesn't segfault? Even when I have to try several times? Ehh. not me!

    8. Re:MT was what AMD had over Intel by Tough+Love · · Score: 1

      Multi-threaded performance was the main advantage that Ryzen had over Intel.

      This type of processor bug can typically be fixed with a microcode patch. Mainly a matter of getting sufficient engineering resources on it and isolating the cause. The publicity certainly helps that process, as does the extensive community testing.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    9. Re:MT was what AMD had over Intel by F.Ultra · · Score: 1

      Learn to live in the danger zone :)

    10. Re:MT was what AMD had over Intel by Anonymous Coward · · Score: 0

      I think it is only commonly run on Gentoo, or otherwise if the user is a developer. Most Linux users don't need to compile programs these days, I can't really remember the last time I did because everything I want to run is either in my distros repositories, or at the least has a package for it. The flag exists because it is very useful if you have to compile large codebases and have multiple cores available, but it isn't something most uses will find themselves using.

  4. Micro needle in mega haystack. by saccade.com · · Score: 1

    I do not envy the crew assigned to tracking that bug down.

    1. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      The comment suggests they already found it, but it isn't fixable.

      Sounds like a localized overheat or localized power distribution issue caused by a bunch of factors AMD assumed were so unlikely they would never happen .

    2. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 5, Insightful

      Processors are not components where you design for the average case and accept failures during peak load. How can a single byte of anything compiled on this processor from now on be trusted not to have been silently corrupted? Does multithreaded disk access run the risk of silently corrupting my files? Until fixed, this processor is toast.

    3. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      I have no idea how this isn't +5. These aren't Windows/Linux/BSD processors, these are x86 processors. For this failure to happen, they are clearly disobeying their own fucking spec in at least one way. You shouldn't be able to generate a segfault no matter what fucking scheduler you use, if any.

    4. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      I do not envy the crew assigned to tracking that bug down

      Wow, you must really suck at hardware design. Or are bitter that people do know how to debug stuff like this.

    5. Re: Micro needle in mega haystack. by lucm · · Score: 1

      Or it could just be buggy Chinese spyware.

      --
      lucm, indeed.
    6. Re: Micro needle in mega haystack. by LostMyBeaver · · Score: 5, Insightful

      I tend to buy at least one AMD system from each generation to give it a go and see if we can't get somewhere without these problems.

      amd486 - system/memory clock (same thing back then) was unstable and too high. This caused all kinds of issues with Maxwell's theorem and it was impossible to run a VESA local bus IDE or VGA adapter reliably. Also consider that the CPU was implemented almost entirely without x86 debug registers which made debugging GPFs a complete nightmare. Very often, Windows NT 3.1 and 3.5 would crash on there and people immediately pointed a finger at Microsoft for the GPFs and blue screens. In reality on AMD CPUs, nearly 50 percent of the GPFs were actually AMD's fault.

      amd586 and 686... these CPUs were huge improvements, but there was some weird issue with the NMI that made debugging code almost impossible. They also had a really bad tendency of bursting capacitors on the system board

      AMD with later generations
      - built in MMU was implemented for users, not servers and developers. it was absolutely horrifying wondering whether my code was going to come out right. memory protection was more of a suggestion to them than a rule.
      - AMD was killing every desktop benchmark, I actually loved AMD at this time as I was playing games and I had bought myself four Shuttle Cubes with the nVidia chipsets and AMD CPUs. I programmed on a dual-Celeron system at work with Linux because it was just faster and better.
      - P4 vs Athlon days. Intel botched the P4 in so many ways it was terrible. It was almost not a challenge for AMD to out-perform Intel as the P4 architecture was an endless mess of cache miss hell. Now... let's be REALLY REALLY fair. P4 would have been the ultimate winner if CPUs were meant for DOS. What I mean is that on a system where there is only a single task (not including hardware interrupt handlers) the P4 pipeline is still a thing of true beauty. But the whole world had moved to Windows XP (got XP and my first P4 on the same shopping trip) and people left DOS, Windows 95/98/ME behind to run a real operating system for the first time... And the P4 was dead before it left the door. The Athlon which was basically equal to a higher clocked Pentium III with an internal MMU ... which in itself was the best thing they ever did.... was amazingly fast. Instead of making a fancier CPU, AMD just kept making the same one and in each generation, focused on moving more bottlenecking systems on-die so the chip performance wouldn't be throttled by external buses. Unfortunately, during this era, both Intel and AMD sucked for development. GCC was a hot wreck as it was still running the crap based on Richard Stallman's code, 2.77 was useless for optimization and 2.89-2.95 was absolutely unreliable. RedHat was trying to make a living porting Linux to every damn device and make it run on ARM (SHITTY DEVELOPMENT PLATFORM at the time), etc... Visual C++ was great and Intel C++ was amazing but you weren't allowed to say that out loud. See, Microsoft was truly evil at the time.
      Following generations of AMD (not including Ryzen)
      - Branding hell... no one that didn't take an obsessive interest in AMD could tell what generation of chip they were buying or even what tier. Even now, having owned many of them, I couldn't tell you which ones were good or bad because I was lost. Intel's current numbering is bad... but not that bad.
      - Memory problems. Yeh... wasted 5 days trying to debug a buffer overflow... then I switched to my Intel based laptop and it showed up in the debugger on the first try. AMD still can't make a fucking MMU. How the hell are you supposed to write a memory manager for an operating system if you can't trap buffer overflows when you clearly defined in the GDT and/or LDT where it should set bounds.
      - Order of execution. On an Intel Core CPU, I can write multiprocessing code, set core affinity based on the position of the core relative to the ring buses. Then I can queue tasks that read/write L1/L2/L3 cache and based on the queui

    7. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      Excellent post.

    8. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      That's harsh. This situation looks FDIV like to me; real potential to harm AMD and the crew will be feeling the weight of that.

      And the root cause could be something the CAE tools should have picked up - power integrity, SSN, thermal - but which they don't because they're new like the process. If so we may never hear about it because of the wording of the multi million dollar contract for those tools.

    9. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      It's a good thing that you have never used any Intel processors then. Ever seen their errata?

    10. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      P4 would have been the ultimate winner if CPUs were meant for DOS.

      I know you were speaking speculatively, but I disagree with this, mainly because of heat. Pentium 4s ran very hot, often idling in the high 50s (even before the infamous Prescott range), compared to faster, cheaper AMD chips of the same time which tended to idle in the 30s. DOS didn't handle letting the CPU idle its spare cycles, so the temperature problem would have been more severe.

    11. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      The P4 pipeline was a thing of true crap. Well actually "the P4" was actually two CPUs and two significantly different pipelines Willamette and Prescott, both crap.

      It wasn't context switching that slowed them down, it was just about everything. From the horribly inefficient trace cache that had to be flushed at the drop of a hat to the tiny L1 Dcache to the dependency replay storms to the single issue x86 decoder to the huge mispredict latency. And it reached reasonably high clock speeds for its age, but at the cost of a lot of power.

      Also

      - Order of execution. On an Intel Core CPU, I can write multiprocessing code, set core affinity based on the position of the core relative to the ring buses. Then I can queue tasks that read/write L1/L2/L3 cache and based on the queuing mechanism, I can ensure cache coherence without the use of thread synchronization mechanisms like mutexes or semaphores. On HPC applications, this gives a general processing performance increase of 200-300% because the individual local cores don't have to sit and wait every time someone wants to write the cache. AMD has not documented their state machines for how they pass data to and from the CPU caches. Even if they did, Intel appears to have standardized theirs across steppings and generations so that it's possible to write your code once and run it for A LONG TIME like that. AMD probably will change this a few times without maintaining documentation because AMD never documents anything other than instruction set and electrical, mechanical and thermal.

      That is not out of order execution. Out of order execution occurs within a single thread of execution, a single core. That's memory consistency and cache coherency. Intel has not exactly documented exact behavior of their multiprocessor fabric, but it is reasonable. AMD has as well though. Certainly AMD implements cache coherency which is a pretty fundamental requirement, but they have also documented their coherency protocols to about the level that Intel have (e.g., see MOESI description in AMD architecture programmers guide) and there is no question of "individual cores sitting and waiting every time someone wants to write to the cache".

    12. Re: Micro needle in mega haystack. by Anonymous Coward · · Score: 0

      How can any code compiled on previous CPU's be trusted to not have been silently corrupted?

      Why do you think that "segmentation fault" is a silent failure mode?

    13. Re: Micro needle in mega haystack. by beerbear · · Score: 1

      Apparently the segfault happens if the data corruption happens to affect a pointer. If it's not a pointer, you get no segfault, but could have silent corruption.

      --
      Hold my beer and watch this!
  5. so how does that work? by Jodka · · Score: 4, Insightful

    It is not like the CPU is testing for that particular combination of conditions alone and conditionally segfaulting. Really, there is a flaw in the CPU design which so far has only been demonstrated to exhibit itself under those conditions. That is much more worrying than the summary leads us to believe.

    I like AMD and Ryzen is a good bargain compared to Intel. It will be my next CPU purchase, though I am holding out until they fix the bug. But I don't like the way they are minimizing the impact.

         

    --
    Ceci n'est pas une signature.
    1. Re:so how does that work? by Kjella · · Score: 1

      It is not like the CPU is testing for that particular combination of conditions alone and conditionally segfaulting. Really, there is a flaw in the CPU design which so far has only been demonstrated to exhibit itself under those conditions. That is much more worrying than the summary leads us to believe.

      Well, from the fact that RMAs has worked for some people and not for others as well as the non-deterministic crashes it seems like it's down to production variation, some chips get unstable and corrupt data if hammered a particular way. Most likely there'll be some microcode update to stagger the problematic sequence and a new stepping increasing the safety margin to fix it properly. Still not good news for AMD, since those who can't easily verify their results will stay away until the scope of the problem is known.

      --
      Live today, because you never know what tomorrow brings
    2. Re:so how does that work? by Anonymous Coward · · Score: 0

      No surprise there. AMD has a history of pushing out unpolished chips meant for high speed and low stability. They're like American muscle cars; awesome, as long as you're going straight ahead and you don't mind driving with a crate of motor oil in the trunk.

    3. Re:so how does that work? by Anonymous Coward · · Score: 0

      You make it sound as though AMD is almost as bad as Intel, but I can't remember stability ever being a systematic problem with AMD production parts before.

    4. Re:so how does that work? by tlhIngan · · Score: 1

      It is not like the CPU is testing for that particular combination of conditions alone and conditionally segfaulting. Really, there is a flaw in the CPU design which so far has only been demonstrated to exhibit itself under those conditions. That is much more worrying than the summary leads us to believe.

      Well, think of a modern CPU as a collection of execution units, In most CPUs, execution units overlap in functionality - a complex instruction may issue several loads (memory to CPU) and stores (CPU to memory), cause several integer units to be called into play (to actually calculate data, or compute addresses) which may cause other loads and stores (especially if it requires hitting the page tables) and more computation. Oh yeah, and data is held in registers, which are renamed - non-dependent uses of a register will allocate a new temporary register between instructions, while dependent registers may be worked on independently if it's not needed until later (result forwarding - if one instruction is computing a complex memory address, and the next one uses it, that instruction can work without knowing what the final destination is until the last minute - usually a cycle before when the result is finally computed, and the result forwarded so it's ready when the following instruction actually needs it).

      This complex dance is coordinated with a control unit controlled via microcode. And often, there will be combinations that get the control unit completely confused (especially so in hyperthread mode, which uses one core to emulate two processors, so one control unit has twice the accounting work). Just a bug that perhaps hangs onto a result a bit longer than it should causing another instruction using the same unit to corrupt the value. But that only happens if you get the control unit in a state caused by a series of instructions and then use another instruction that finally collapses the whole thing. Especially ones that happen using loads that get scheduled on the same core.

    5. Re:so how does that work? by AmiMoJo · · Score: 3

      All modern CPUs run microcode that is updated on boot by the BIOS. So fixing this will just be a microcode update, i.e. a BIOS update. AMD has been quite good at getting vendors to ship such updates for their motherboards and systems, but if for some reason they don't you could load it via a driver under Linux too.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    6. Re:so how does that work? by rew · · Score: 3, Interesting

      There MUST be some things in hardware to execute anything. While they (the chip manufacturers) have surprised me in the past, not all bugs CAN be fixed with a microcode update.

      A long, long time ago, people wrote "self modifying code". Say for doing bit-operations on parts of the screen buffer, you might pass 1 for AND 2 for OR and 3 for XOR. The function could then place the AND/OR/XOR opcode in the middle of the doit loop and then perform the loop.... So one day the manufacturer guarantees that the new machine will execute everything the old one did. Bad move. Turns out the new machine is faster because it prefetches instructions. By the time the code has determined the opcode for inside the loop, the loop (with the last AND/OR/XOR opcode in place) has already been prefetched. This prefetching is at the core of why the machine is fast. Implemented in hardware. Can you fix that with a microcode update? Apparently in the case at hand (PR1ME9955): yes.

      But I can easily see it happen that either you disable the whole prefetching stuff (slow everything down enormously) or you need say an extra comparator ("Is the store happening near my PC, possibly near my prefetch queue?") to allow for "normal" cases to use the prefetch queue, but this special case to flush the queue only when necessary. In any case, the microcode was updated and stuff worked properly again.

  6. Great selling point by Anonymous Coward · · Score: 0

    We have tons of cores so you can do many things at once. Just don't use them all at once!

  7. Don't worry... by ckatko · · Score: 5, Insightful

    ..the faults only happen for people with massive parallel loads.

    You know... the main reason people buy the CPUs.

    1. Re:Don't worry... by alvinrod · · Score: 2

      Well they do say Threadripper and Eypic are unaffected, and I think those chips are a different stepping than the initial batch of Ryzen chips so the probably may already be fixed. It may be possible to fix the others with a firmware update, though who knows how long that will take to roll out depending on other things AMD is working on and their other priorities.

    2. Re:Don't worry... by arth1 · · Score: 1

      Yeah. i was contemplating getting a Ryzen for my new PC internals, which are due for changing out now.
      But if I can look forward to crap like this, it's not even an option - even if it were free, I wouldn't use it.

    3. Re:Don't worry... by Anonymous Coward · · Score: 0

      It doesn't affect current chips, only the first series.

  8. Servers? by Anonymous Coward · · Score: 0

    This seems a bit concerning for their plans to take on xeons in the server market which would pretty much all run linux with high parallel loads

    1. Re:Servers? by Anonymous Coward · · Score: 0

      They're not using Ryzen in the server market, they've got ThreadRipper and Epyc for that. Which (apparently) aren't affected.

    2. Re: Servers? by Anonymous Coward · · Score: 0

      That you know of. Yet. With your current kernel.

      We need a teardown of the problem, software workarounds, etc, before we know it is not really present, right?

    3. Re:Servers? by arth1 · · Score: 1

      They're not using Ryzen in the server market, they've got ThreadRipper and Epyc for that. Which (apparently) aren't affected.

      No, but they are targeting the workstation market, where a good sized chunk of the users do things like building software. That's one of the factors that make buyers choose more cores and RAM over higher clock speeds.

  9. why would I buy a processor that *might* segfault by iggymanz · · Score: 4, Insightful

    never mind my load type today, what about 2 years from now? why would I spend money on something that *might* segfault and for which the vendor isn't going to provide a solution to *everyone*. case by case basis my ass, that's the sign of a tech hardware vendor which should be shunned.

  10. These are a bear to track down by Anonymous Coward · · Score: 1

    And could still wind up being a Linux fault, though the various Intel errata have this sort of fault showing up a number of times, with multi-byte ops crossing page boundaries or the ilk, so no reason to single out Linux yet. Windows does so much structure-padding everywhere by default it's much less likely to occur there. This is where the ops pipeline dump comes in handy if it's deep enough.

    1. Re:These are a bear to track down by lucm · · Score: 1

      Or maybe the problem does happen on Windows and nobody noticed because they're constantly rebooting to install an update anyways.

      --
      lucm, indeed.
    2. Re:These are a bear to track down by Misagon · · Score: 4, Informative

      It has been confirmed to be a processor bug, not a software bug.
      BSD kernel developer Matt Dillon sent AMD a reproducible test case back in April.
      You can read more about it here.

      --
      "We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
  11. Phoronix FAIL by Anonymous Coward · · Score: 5, Insightful

    Phoronix: "certainly not the workloads most Linux users will be firing off on a frequent basis"

    Bullshit. Anyone who does video encoding will easily max out a Ryzen. Anyone who builds software for a living will max out q Ryzen. In fact, just about anybody who needs more computing power than a Chromebook will max out Ryzen.

    AMD you fucked up big time. Bigly.

    And Phoronix, who are you to say what people should be doing with their machines? People paid for this computational hardware and should expect it to perform as advertised.

    1. Re:Phoronix FAIL by 0123456 · · Score: 3, Informative

      Not to mention that one of the reasons we want more cores in our desktop machines is to speed up C++ compiles by compiling more files in parallel.

    2. Re: Phoronix FAIL by Anonymous Coward · · Score: 0

      I was about to get one for exactly that reason.
      Either it works or it doesn't.
      It sounds Ryzen doesn't work.

    3. Re:Phoronix FAIL by doconnor · · Score: 1

      Video encoding makes heavy use of the SIMD units of the processor which is a different type of load then compiling which makes heavy use of the conventional integer logic part of the processor.

    4. Re:Phoronix FAIL by snarfies · · Score: 1

      "Bullshit. Anyone who does video encoding will easily max out a Ryzen. Anyone who builds software for a living will max out q Ryzen. In fact, just about anybody who needs more computing power than a Chromebook will max out Ryzen."

      In other words, Phornonix is 100% correct.

  12. Re:AMD shoots itself in the foot with Windows too by Anonymous Coward · · Score: 1

    Microsoft and Intel doesn't support Windows 7 with their latest chips either (like that i7 7700 Kaby Lake for example). What's your point?

  13. Because if you read the errata for Intel... by Anonymous Coward · · Score: 1

    and assumed AMD would release a microcode fix (as they usually do) you would realize neither company has been making solid chips for at least 15 if not 20 years, and as they have tried to squeeze every ounce of performance out of, and every optimization into each chip, they've made design compromises that often don't show up until real world workloads.

    Personally I am pretty sure the AMD segfaults could be handled by either retuning, or disabling that nice little 'neural network' frontend, and I am not entirely convinced the segfaulting wasn't some sort of intentional government approved cornercase for breaking the chips without need of an overt backdoor (like existing access to the AMD PSP for instance..)

    Having said all this, I am going to be sticking to either AM3 or AM3+ chips for a few more years, and maybe a dual or quad G34 or an older (me_cleaner compatible) LGA 2011 motherboard until secure non-x86 alternative processors make it out. Basically everything 'indie' videogame-wise is running on Unity/dotnet/mono now, and most of what is not could be run with machine translation given Loongson MIPS style x86 translation microops and a 3-4ghz processor clock.

    The end of x86 is nigh, lost not due to the performance crown, but to the security crown.

    1. Re:Because if you read the errata for Intel... by Anonymous Coward · · Score: 0

      Garbage in the processor, garbage out...

  14. Re:AMD shoots itself in the foot with Windows too by Kremmy · · Score: 1
  15. Re:why would I buy a processor that *might* segfau by epine · · Score: 0

    why would I spend money on something that *might* segfault and for which the vendor isn't going to provide a solution to *everyone*

    You're dreaming if you don't think you run a similar risk with Intel. The only difference here is the proximal news cycle.

    Tomorrow's Market Probably Won't Look Anything Like Today

    The recency bias is pretty simple. Because it's easier, we're inclined to use our recent experience as the baseline for what will happen in the future. In many situations, this bias works just fine, but when it comes to investing and money it can cause problems.

    Well, I suppose there are worse problems in life than paying 20% more for 20% less because of an edge case.

    Unless you make it a habit, and it begins to consume larger fish, like your 401(k).

  16. So far so good by I'm+just+joshin · · Score: 4, Informative

    Anecdote here...

    Ryzen 1700 w/ 64GB running Promox and 6 virtual machines - 1 Debian, 1 Gentoo (build machine), 1 PF Sense, and 3 Windows.

    Been rock solid doing full world builds on Gentoo, PCI passthrough of a GTX 1070 card to one of the Windows VMs (gaming actually works well), and has only been rebooted once since getting it going. Uptime of 24 days.

    No segfaults,

    It is amazingly fast & quiet. Quite the upgrade from my I7-3770K.

    1. Re:So far so good by jon3k · · Score: 1

      PCI passthrough of a GTX 1070 card to one of the Windows VMs (gaming actually works well),

      I'm currently building a Ryzen linux box (parts are literally sitting on the desk beside me) and I've been following the PCIe pass-through intermittently, mostly Wendell and Level1Techs. Can you share some details on how you got everything working and issues you've run in to?

    2. Re:So far so good by I'm+just+joshin · · Score: 2

      I mostly followed this: https://pve.proxmox.com/wiki/P.... If you're passing a nVidia GPU, be sure to pull a copy of its BIOS and pass it to KVM.

      In addition, I passed most USB ports, and my PCI-E Soundblaster card to the Windows VM.

      Good luck.

    3. Re:So far so good by jon3k · · Score: 1

      Thanks for your reply, I appreciate it. Have you had good luck with it?

    4. Re:So far so good by I'm+just+joshin · · Score: 1

      It was a pain to get working, but has been fantastic.

      I lose about 5% performance from virtualizing everything. Obviously, when gaming, I have the load from the other stuff turned down.

    5. Re:So far so good by jon3k · · Score: 1

      That's really encouraging, thanks again.

  17. Re:AMD shoots itself in the foot with Windows too by Anonymous Coward · · Score: 0

    ... yet AMD has refused to support Ryzen on Windows 7.

    The correct wording is this: ... yet AMD has refused to support Windows 7 on Ryzen.

    You don't run a CPU on an OS unless you're talking virtualization - and even then most OSes rely on hardware support for the virtualization.

  18. I don't see... by thadtheman · · Score: 1

    what causes the problem or the exact circumstances it happens under.

  19. You guys new or something? by Orgasmatron · · Score: 4, Interesting

    Not (necessarily) a big deal. CPUs have bugs. The kernel, the compilers and the standard libraries are all stuffed full of workarounds for various CPU errors. They are called "errata" and pretty much every CPU has them. (One could argue that corrigendum would be a more appropriate word for them.) Intel has had some big ones, the most memorable (off the top of my head) were FOOF and FDIV. The 286 was so riddled with bugs that everyone gave up trying to write a protected mode kernel and just waited for the 386.

    Basically, they'll figure out what is causing the error and how to avoid it. If the workaround is easy, like "have the compiler reorder some instructions", a few patches will go out and life goes on, no big deal.

    If the workaround is less easy, like "don't utilize all cores", or "bump the clock multiplier down to overcome a thermal or electrical issue", that is a much bigger deal. If you don't meet marketing numbers, your choices are refund or replace. Intel spent a half billion dollars replacing CPUs because of the FDIV bug, even though they calculated that most people would never encounter it and it was relatively easy to patch around (but the patch would have been a drag on FPU performance - and marketing again had made promises).

    --
    See that "Preview" button?
    1. Re:You guys new or something? by Anonymous Coward · · Score: 0

      It's Eternal Goddamned September around here. /. was never known for its high-quality discourse, but the quality of discourse has dropped to low-grade hate-addicted-Tumblr-mob level as of late.

    2. Re:You guys new or something? by Misagon · · Score: 2

      The first bug report with a test case that reproduced the bug was submitted to AMD in April, and they have acknowledged the bug first now.

      And how long would we have to wait for a microcode update?

      --
      "We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
    3. Re:You guys new or something? by Anonymous Coward · · Score: 0

      Probably not that long, if they now know the exact mechanism that causes the problem. Unless they cannot fix it, then they will have to replace the affected CPUs instead.

    4. Re:You guys new or something? by Anonymous Coward · · Score: 0

      The 286 was so riddled with bugs that everyone gave up trying to write a protected mode kernel and just waited for the 386.

      Well, technically there's ELKS, which can run on 286 protected mode:
      https://github.com/jbruchon/elks

  20. Re: AMD shoots itself in the foot with Windows too by Anonymous Coward · · Score: 0

    The correct wording is this: ... yet AMD has refused to support Windows 7 on Ryzen.
    You don't run a CPU on an OS unless you're talking virtualization - and even then most OSes rely on hardware support for the virtualization.

    Are you mentally retarded?

    The OS runs on the CPU.

    It was perfectly worded.

  21. FreeBSD by Anonymous Coward · · Score: 0

    I've been having problems compiling various ports (such as emacs) under released versions of FreeBSD (both 10.3 and 11.1) on a Ryzen 5 (1600). It seems fairly repeatable, although at different places. This might be due to either the most recent bug or the other one mentioned in the Phoronix article.

  22. Re:AMD shoots itself in the foot with Windows too by Anonymous Coward · · Score: 0

    The correct wording is this: ... I'm a big fat moron and you can ignore everything I say.

    FTFY

  23. Intel's first 2 gens of hyperthreading were bust by Anonymous Coward · · Score: 1, Interesting

    It seems that Ryzen's hyperthreading, on Linux, under very rare circumstances, can cause memory errors. And Intel is spending millions flooding every tech forum and tech site with shill propaganda decaring this to be the 'end of the world'.

    But Intel would like you to forget that its first two generations of hyperthreading were so broken, you had to switch it off altogether to do any serious work.

    Hyperthreading needs scheduling to be sane and sympathetic. So no issues on the vastly better coded Windows. Sadly Linux is a joke from a software stability POV. So two threads on one core with inter-dependencies have many possibilities to cause bugs.

    I once had Windows crash rarely when launching video. Turned out that I had a driver (emulating a DVD ROM) that failed to prevent its IRQ driver from 'paging out' under memory 'pressure'. And for some reason playing video had a real chance of grabbing the memory used by the interrupt code. The bug was 100% the fault of the IRQ code. And when i tracked it down, turned out there was a driver update that fixed the very bug.

    Seems the Linux bug on Ryzen is the same sort of thing. One thread, apparently, has to be an interrupt. The compile load has to be so very taxing, the entire system RAM is under constant load. And I bet my bottom dollar the hopeless Linux coder has failed to flag the interrupt handling code as 'non-paging'. Or the Linux scheduler screws up ring zero ultra-priority interrurpt handlers, and lets then 'time out' under pressure.

    Before you say "but Intel works"- WRONG. The person (sponsored by Intel) flooding forums with this 'bug' and the script to trigger it had to change the script code over and over again when users discovered it was triggering the same errors on Intel systems as well. What we know for REAL (as opposed to this fake news) is that certain compile workloads on Intel and AMD cause memory issues if hyperthreading is on. And the reason is certain to be bad linux coding.

    If version 1,2,3,4,5 and 6 of the workload script crashed both Intel and AMD, and version 7 so far (so its claimed) only affects some ryzen chips, well the problem is clearly not unique to Ryzen.

    PS again the people responsible for banging on about the issue are sponsored by Intel- and Intel has a very large active bounty for anyone who can 'prove' faults in Ryzen.

  24. Only on Linux by Khyber · · Score: 1, Interesting

    That tells me someone's code is fucked up, not that AMD's processors are screwed. Ain't happening on my Hackintosh, ain't happening on my Windows box.

    Did someone let Grsecurity do the SMT kernel code?

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    1. Re:Only on Linux by Dagger2 · · Score: 2

      You could just as easily argue that the fact that Linux works fine on other Ryzen processors, AMD's older processors and Intel's processors, and only segfaults on these specific Ryzen models, tells you that it's these processors that are broken, not Linux.

      Of course -- and I shouldn't really have to explain this on Slashdot of all places, but neither of these observations actually tell you where the problem is. Doing that involves doing some investigation, and the fact that AMD appear to be accepting blame suggests that they've done the investigation and believe it's their fault.

    2. Re:Only on Linux by Khyber · · Score: 1

      I have an actual background in hardware and software troubleshooting. This is very clearly the sign of bad code, not bad hardware. Testing for similar problems under both Windows and my Hackintosh boot partitions, using software compilation tools on a high thread count. Oh, BTW, since Windows 10 has a SMT Scheduling problem with Ryzen (but only Windows 10, Windows 7 is unaffected) again this tells me that it's clearly in the newer software implementations, not hardware, as I'm unable to trigger the SMT bug using Slackware and Linux kernel 2.6.39 but can reliably trigger it in any kernel 3 and higher.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    3. Re:Only on Linux by Ash-Fox · · Score: 1

      Try not to take Khyber too seriously.

      One time I reflected on the fact on IRC that there were very few musicians in the furry fandom. He then went and made a story that there are many, but they're on a secret IRC network at a certain domain which wasn't registered. When this was pointed out to him, he then went on a tirade about how his father works for the DoD and had his domain super special secret ninja registration. I pointed out that I couldn't resolve the domain anyway, he then went on to say it was IPv6 only. I checked over IPv6, no DNS resolution, no name servers, nothing.

      But we were wrong and he kept fabricating various credentials.

      Someone else shortly registered the domain in the chat to prove him wrong.

      --
      Change is certain; progress is not obligatory.
    4. Re:Only on Linux by Anonymous Coward · · Score: 0

      I think I reproduced it on Windows with SQL Server doing heavy loads.

    5. Re:Only on Linux by Anonymous Coward · · Score: 0

      I'm using Fastbuild 0.93 on Windows 10 Pro (1703, 15063.483) to do parallel building over the network, the compilers have been exclusively crashing on occasion on the network build nodes running these processors. Why do you think it's been crashing for me?

    6. Re:Only on Linux by Khyber · · Score: 1

      Windows 10 Pro has the SMT bug. It's reliably reproducible locally and via network nodes.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    7. Re:Only on Linux by Anonymous Coward · · Score: 0

      Windows 10 Pro has the SMT bug. It's reliably reproducible locally and via network nodes.

      I can't find any posts of people reproducing crashes with the SMT bug though?

      Nor does AMD's official response seem to acknowledge a crash issue at all?

      It doesn't really make sense to me why lower performance would cause compilers to crash? I have nodes with much lower end hardware (approaching 8 years old - We were replacing those progressively with these Ryzen systems until we noticed this issue) that are less performant and aren't crashing?

    8. Re:Only on Linux by Khyber · · Score: 1
      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
  25. Mod Points by bobbuck · · Score: 2
    "I have no idea how this isn't +5."

    Well, the last time I had mod points, I wasted them on comments in a post announcing the invention of the telegraph so don't expect much modding from me.

  26. Re:why would I buy a processor that *might* segfau by lucm · · Score: 1

    Intel has been rock-solid since forever. AMD has been unreliable since forever. If you think this will change today, you're kidding yourself.

    AMD makes gadgets for overclocking enthusiasts and gamers on a budget. There's nothing wrong with that, and they've kept Intel on their toes which is a good thing. But it's not the same class of product unless your focus is only on net gigahertz per dollar.

    Being surprised by this kind of problem is like being surprised that Windows phones home or that HP is fucking you in the ass with their ink cartridges.

    --
    lucm, indeed.
  27. Re:why would I buy a processor that *might* segfau by Anonymous Coward · · Score: 0

    > Intel has been rock-solid since forever.

    The errata for Intel CPUs for the past five+ years indicate that this is not true.

  28. WTF? by johannesg · · Score: 1

    What _else_ would people buy such CPUs for then, if not for "massive workloads"?

    Also, somehow I'm feeling considerable distrust that the OS should be able to somehow 'fix' this. Probably by turning off features until it runs at a fraction of the speed, my guess is...

    Anyway, yesterday I already sent out an email saying "don't buy Ryzen". First time I've ever done that, so well done, AMD.

    1. Re:WTF? by Anonymous Coward · · Score: 0

      Also, somehow I'm feeling considerable distrust that the OS should be able to somehow 'fix' this. Probably by turning off features until it runs at a fraction of the speed, my guess is...

      There are hundreds of bugs in CPUs that are fixed by OSes or microcode without anyone ever noticing. I don't see any reason for this one to be any different.

      Anyway, yesterday I already sent out an email saying "don't buy Ryzen". First time I've ever done that, so well done, AMD.

      Why are you blaming AMD for something you did yourself?

    2. Re:WTF? by Anne+Thwacks · · Score: 1
      What _else_ would people buy such CPUs for then, if not for "massive workloads"?

      Because "shiny" - why do most people buy new processors?

      --
      Sent from my ASR33 using ASCII
  29. Re: AMD shoots itself in the foot with Windows too by Anonymous Coward · · Score: 0

    Correct wording: Microsoft refuses to take patches from and or Intel for anything older than win10

  30. Intel... by Anonymous Coward · · Score: 0

    Hah. Seems Intel has found a use for its coasters. Hanging out on Slashdot.

  31. Odd reactions coming from slashdot... by Anonymous Coward · · Score: 0

    Somehow Intel having an HT bug and everyone's "hey, look they fixed it" meanwhile AMD has something similar and everyone's "yeah nope, dead architecture. shitty processor."

    If you bothered to look further, it seems to affect only a specific batch of Ryzens and seems to be patchable via microcode.

    1. Re: Odd reactions coming from slashdot... by Anonymous Coward · · Score: 0

      Source?

  32. Something is bugging me about that by dbIII · · Score: 2

    Intel has been rock-solid since forever

    Complete F00F.

  33. Re:why would I buy a processor that *might* segfau by iCEBaLM · · Score: 2

    Intel has been rock-solid since forever.

    https://arstechnica.com/inform...

  34. Re:Intel's first 2 gens of hyperthreading were bus by Anonymous Coward · · Score: 0

    I do believe that this has been confirmed to be the processor's fault.

    In addition;

    The person (sponsored by Intel) flooding forums

    Citation Needed

  35. Re:Intel's first 2 gens of hyperthreading were bus by Anonymous Coward · · Score: 1

    It seems that Ryzen's hyperthreading, on Linux, under very rare circumstances, can cause memory errors. And Intel is spending millions flooding every tech forum and tech site with shill propaganda decaring this to be the 'end of the world'.

    But Intel would like you to forget that its first two generations of hyperthreading were so broken, you had to switch it off altogether to do any serious work.

    Hyperthreading needs scheduling to be sane and sympathetic. So no issues on the vastly better coded Windows. Sadly Linux is a joke from a software stability POV. So two threads on one core with inter-dependencies have many possibilities to cause bugs.

    I once had Windows crash rarely when launching video. Turned out that I had a driver (emulating a DVD ROM) that failed to prevent its IRQ driver from 'paging out' under memory 'pressure'. And for some reason playing video had a real chance of grabbing the memory used by the interrupt code. The bug was 100% the fault of the IRQ code. And when i tracked it down, turned out there was a driver update that fixed the very bug.

    Seems the Linux bug on Ryzen is the same sort of thing. One thread, apparently, has to be an interrupt. The compile load has to be so very taxing, the entire system RAM is under constant load. And I bet my bottom dollar the hopeless Linux coder has failed to flag the interrupt handling code as 'non-paging'. Or the Linux scheduler screws up ring zero ultra-priority interrurpt handlers, and lets then 'time out' under pressure.

    Before you say "but Intel works"- WRONG. The person (sponsored by Intel) flooding forums with this 'bug' and the script to trigger it had to change the script code over and over again when users discovered it was triggering the same errors on Intel systems as well. What we know for REAL (as opposed to this fake news) is that certain compile workloads on Intel and AMD cause memory issues if hyperthreading is on. And the reason is certain to be bad linux coding.

    If version 1,2,3,4,5 and 6 of the workload script crashed both Intel and AMD, and version 7 so far (so its claimed) only affects some ryzen chips, well the problem is clearly not unique to Ryzen.

    PS again the people responsible for banging on about the issue are sponsored by Intel- and Intel has a very large active bounty for anyone who can 'prove' faults in Ryzen.

    you seem like someone who has been payed (by most likely MS) to badmouth Linux

  36. Re:Intel's first 2 gens of hyperthreading were bus by Anonymous Coward · · Score: 1

    Hyperthreading needs scheduling to be sane and sympathetic. So no issues on the vastly better coded Windows. Sadly Linux is a joke from a software stability POV.

    The problem has been reproduced on Windows, using WSL. Also FreeBSD and DragonFlyBSD are affected.

  37. Re:Intel's first 2 gens of hyperthreading were bus by rew · · Score: 1

    Just FYI: On Linux IRQ handlers can never be paged out on a very fundamental level.

    You might think it's useful, but the thinking is that it just MIGHT be the IRQ (kernel memory) for the "get it back from disk" part. So in general stuff like that is never paged out.
    In modern systems you'll probably use maybe 3-10Mb of memory for kernel code. If you have little main memory (1GB) that's still less than 1%. So no reason at all to change this policy.

  38. Re:why would I buy a processor that *might* segfau by cheesybagel · · Score: 1

    Intel has been rock-solid since forever.
    FDIV bug. F00F bug. TSX bug. Hyperthreading bug. Need I continue?

  39. systemd-amd-ryzen-hyperthreading by Anonymous Coward · · Score: 0

    found your problem.
    Its the systemd-amd thingy

  40. Re:why would I buy a processor that *might* segfau by iggymanz · · Score: 1

    My intel processors don't segfault under heavy load. That includes compiler load at home and virtual machine load at my employer. Why would I risk changing that?

  41. Re:Do worry... by Anonymous Coward · · Score: 0

    It doesn't affect current chips, only the first series.

    Untrue, my 2 week old Ryzen build exhibits this problem, also CPUs from multiple batches are affected.