Slashdot Mirror


New HyperThreading Flaw Affects Intel 6th And 7th Generation Skylake and Kaby Lake-Based Processors (hothardware.com)

MojoKid writes: A new flaw has been discovered that impacts Intel 6th and 7th Generation Skylake and Kaby Lake-based processors that support HyperThreading. The issue affects all OS types and is detailed by Intel errata documentation and points out that under complex micro-architectural conditions, short loops of less than 64 instructions that use AH, BH, CH or DH registers, as well as their corresponding wider register (e.g. RAX, EAX or AX for AH), may cause unpredictable system behavior, including crashes and potential data loss. The OCaml toolchain community first began investigating processors with these malfunctions back in January and found reports stemming back to at least the first half of 2016.

The OCaml team was able pinpoint the issue to Skylake's HyperThreading implementation and notified Intel. While Intel reportedly did not respond directly, it has issued some microcode fixes since then. That's not the end of the story, however, as the microcode fixes need to be implemented into BIOS/UEFI updates as well and it is not clear at this time if all major vendors have included these changes in their latest revisions.

18 of 135 comments (clear)

  1. Apocryphal .... by brantondaveperson · · Score: 3, Insightful

    .. doesn't mean what the article writer appears to think it means.

    Anyhow, that a new highly complex processor contains subtle bug that's fixable without hardware modification isn't exactly earth-shaking news, surely? How about they just fix it, and we move on.

    1. Re:Apocryphal .... by KiloByte · · Score: 2, Informative

      The fix doesn't disable hyperthreading, the fix fixes the bug.

      The fix works only for some models of Skylake (models 78 and 94, stepping 3). On any other Skylakes and all Kaby Lakes there's no way other than disabling hyperthreading entirely.

      A fix might or might not be released in the future, Intel doesn't say a word about the issue.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    2. Re:Apocryphal .... by KiloByte · · Score: 2

      Uhm, nope. Only those two models have a fix issued. On everything else, you do need to disable HT, which obviously takes a massive performance hit.

      Those OCaml guys merely noticed and diagnosed the problem first -- your average mouth-breathing Windows user will have a game crash, a browser corrupt a Twitter page or MS Word lose data yet again, but that's what such users are used to. Yet that their systems are already crashy doesn't mean this extra source of crashes and data corruption doesn't matter.

      As for your legendary "BIOS/UEFI updates", those can be relied on about as much as $YOUR_COUNTRY'S_PARLIAMENT to stop bribes towards their own members. Most of those "system vendors" don't pass fixes, and when they do, they're anything but timely.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    3. Re:Apocryphal .... by thestuckmud · · Score: 4, Interesting

      hyper-threading ... purportedly boosts performance by about 10-20% for multithreaded apps.

      10-20% might be the case. I've also read some claims of 30% speed boost from hyperthreading. The boost is highly dependent on workload.

      For me, 100% is often the case. I do a lot of tight number theoretic math loops and was astounded to find that one of my typical computations -- with little memory use and essentially no communication -- was 8 time quicker on 4 cores/8 threads compared to the single threaded version. Perfectly efficient! Your mileage will very probably vary, but it works for me.

      And, FWIW, I usually prototype in oCaml :-)

    4. Re:Apocryphal .... by swilver · · Score: 3, Interesting

      I looked into HT a bit, and its performance gains.

      Basically, it comes down that as soon as you have real cores available that HT barely does anything and sometimes even becomes detrimental for performance. So if you have 1 core, HT shows some real benefits. With 2 cores it was pretty marginal, and with 4 cores or more you might as well disable it.

    5. Re:Apocryphal .... by arglebargle_xiv · · Score: 5, Funny

      If you don't know the model name of your processor(s), the command below will tell you their model names. Run it in a command line shell (e.g. xterm):
      grep name /proc/cpuinfo | sort -u

      C:\>grep name /proc/cpuinfo | sort -u
      'grep' is not recognized as an internal or external command, operable program or batch file.
      C:\>

    6. Re:Apocryphal .... by zwarte+piet · · Score: 4, Funny

      Your terminal is throwing weird smilies at you. I think it is cross with you.

  2. BTW, AMD has a similar bug too by Misagon · · Score: 5, Interesting

    AMD Ryzen also seems to have a similar bug, related to hyperthreading that happens only in very special circumstances.

    Quite a few Ryzen users have experienced instability problems during heavy compilation loads under Linux, especially those using compile-based distros such as Gentoo, but also under the Ubuntu subsystem on Windows.
    There has been some debate whether the problems would have been caused by an actual bug, or if the people who experienced them simply had an unstable overclock - the latter being something that has also cropped up in forums recently.

    Matthew Dillon, of Dragonfly BSD fame (and Amiga fame before that...) does believe that he has found a reproducible bug. He sent a test case about it to AMD in April.
    This is not the first time Dillon has found a hardware bug in a AMD CPU. He found one for an earlier AMD CPU back in 2012 which was fixed in a microcode update.

    I expect this to be fixed in a BIOS/microcode update soon, if not already in AGESA 1.0.0.6 - but I have yet to see any confirmation that it would have been fixed.

    --
    "We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
    1. Re:BTW, AMD has a similar bug too by gweihir · · Score: 3, Insightful

      The difference is that Ryzen is a new architecture, where this is sort-of expected. Intel has this in an old architecture and that is just not acceptable.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:BTW, AMD has a similar bug too by gravewax · · Score: 2

      WTF? no a bug like this is NOT expected in a new architecture, such bugs can result in billions in losses, something AMD can't afford. having said that people overclocking has always been a problem when it comes to stability.

    3. Re:BTW, AMD has a similar bug too by slashdot_commentator · · Score: 2

      > The difference is that Ryzen is a new architecture, where this is sort-of expected.

      No. In the sense that this is a hardware issue, not a software issue. Hyperthread issues are not expected because the hardware is "new". Software "engineering" is a joke compared to the rigor required in hardware development. I know you may not understand this, but (successfully employed) engineers of hardware are not allowed to fuck up. Manufacturing companies spend 1000x more in design, expertise, and quality control to make sure hardware doesn't have any flaw. Because if hardware does have a flaw, its hardware -- it can't be fixed with a software patch.

      > Intel has this in an old architecture and that is just not acceptable.

      With a "new" architecture, it cannot fall back on years of previous versions of hardware to expose a bug. That does not mean Intel would be immune to subtle forms of design flaws, since every new version of chip means having done something new on the microcode level. Otherwise, it would be meaningless to be adding tens of thousands more transistors to do the job exactly as the previous version of chip.

      It is blowing my mind right now how so many gaming enthusiasts and other amateur computer users fail to grasp the gravity of this hardware flaw. Yes, a change can be made in the CPU's microcode to avoid the flaw, but its not like Intel microcode can be rewritten from scratch. A patch on the BIOS level only means that hyperthreading is disabled. That means the extra hundreds of dollars spent to make your PC 10-30% faster than an i5 has been pissed away; your i7 just lost its performance edge to an i5. If something is actually modified on the i7's microcode, its more about kludging the intended hyperthread operation to avoid the logic bug (which means your i7 loses performance).

      Fortunately, a program crashing or generating incorrect data is pretty meaningless to gamers and websurfers. But if I was an investment bank, bitcoiner, scientist, or engineering company dependent on computed output, I'd be extremely pissed off. It makes me wonder if Intel is going to have to do a product recall (or perhaps an extremely limited rebate to select industry customers). AMD really should figure out a smart way to capitalize on Intel's foobar.

      --
      There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
    4. Re:BTW, AMD has a similar bug too by Rockoon · · Score: 2

      Not at the same magnitude that they are today.

      Sure, lots of little things like the cpu will crash in some cases if a second level shadow of the carry flag register is set immediately after some other thing... so the fix is for the microcode to reorder the operations a little bit so that operations that target the carry flag are on the shallow side of the shadow registers... or at least never in that 2nd slot.. shit like that aint nothing

      It is the parallel execution itself that isnt working here. Wrong results are produced. It isnt that performance is bad, or a core locks up... instead bad values are produced. Its a horrible bug at least as bad as FDIV.

      --
      "His name was James Damore."
  3. Re:Zen for the win! by rudy_wayne · · Score: 5, Funny

    All you losers with your over-priced Intel crap.

    I've used nothing but AMD for 20 years and I have absolutely no probl%#^$^%J NJasllodofufm DUDFUF&&()()FDJJDNDMS .......

  4. What’s your point, exactly? by Picodon · · Score: 4, Funny

    Are you complaining about the topic as being too insignificant to deserve an article (as in: no need to tell people that they way want to update their servers) or are you preemptively commenting that other readers shouldn’t bother to comment on such an insignificant topic?

  5. OCaml by Cochonou · · Score: 2

    It's a bit paradoxical that it was the OCaml team who found this bug, whereas OCaml is notoriously bad at parallelism.

  6. Inaccurate article and comments by Anonymous Coward · · Score: 4, Informative

    There are a lot of inaccurate comments here. First of all, reloading a new BIOS/system firmware may be the best solution for most users, however it is not the only solution. If you know how you can do a hotfix load of firmware in Linux and I suspect other OSes.

    For example, I downloaded the latest firmware from Intel (dated 10 May) and placed it in /lib/firmware. Then running:

    echo 1 > /sys/devices/system/cpu/microcode/reload

    was enough. In the log is an entry:

    [2246029.695843] microcode: updated to revision 0xba, date = 2017-04-09

    In addition, the article points to a message on the debian-devel (not users) mailing list. This indicates that i3/5/7 processors with hyperthreading are affected. AFAIK, no i5 processors have hyperthreading, even though the family/model/stepping on my system is indicated in the message as vulnerable.

    CPU(s): 4
    On-line CPU(s) list: 0-3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s): 1

    Well what is it? Hyperthreading or all skylake/kaby lake? Curious minds want to know.

    One last thing. The current firmware package is dated May 10. Seven weeks ago, The firmware itself was produced April 9 -- 11 weeks ago. Unless Intel has not updated yet for this, many posters here are running around with their hair on fire about something already fixed.

    But I guess that is normal for slashdot.

  7. Re:Should have went with AMD by sqorbit · · Score: 2

    AMD: The Quality Goes In Before the Name Goes On.

    See earlier comment about how AMD has a very similar bug.

    --
    Sent from my TARDIS
  8. Re: As well as what? by Anonymous Coward · · Score: 2, Interesting

    different AC, but regardless you are wrong he is right. all of the CPU's can be fixed/updated via microcode, however for some models that haven't had publicly available fixes published you have to go to your vendor and ask them for it. that doesn't mean it requires them to do it, but they are the only ones that will currently have the updates.