Slashdot Mirror


Linus Torvalds Calls Intel Patches 'Complete and Utter Garbage' (lkml.org)

An anonymous reader writes: On the Linux Kernel Mailing List, Linus Torvalds ended up responding to a long-time kernel developer (and former Intel engineer) who'd been describing a new microcode feature addressing Indirect Branch Restricted Speculation "where a future CPU will advertise 'I am able to be not broken' and then you have to set the IBRS bit once at boot time to *ask* it not to be broken."

Linus calls it "very much part of the whole 'this is complete garbage' issue. The whole IBRS_ALL feature to me very clearly says 'Intel is not serious about this, we'll have a ugly hack that will be so expensive that we don't want to enable it by default, because that would look bad in benchmarks'. So instead they try to push the garbage down to us. And they are doing it entirely wrong, even from a technical standpoint. I'm sure there is some lawyer there who says 'we'll have to go through motions to protect against a lawsuit'. But legal reasons do not make for good technology, or good patches that I should apply."

Later Linus says forcefully that these "complete and utter garbage" patches are being pushed by someone "for unclear reasons" -- and adds another criticism. The whole point of having cpuid and flags from the microarchitecture is that we can use those to make decisions. But since we already know that the IBRS overhead is huge on existing hardware, all those hardware capability bits are just complete and utter garbage. Nobody sane will use them, since the cost is too damn high. So you end up having to look at "which CPU stepping is this" anyway. I think we need something better than this garbage.

12 of 507 comments (clear)

  1. Is there any other option, Linus? by aglider · · Score: 5, Interesting

    You are right, Linus, as usual.

    But I'd prefer the Linux Kernel Development team to push a complete proposal on the table.
    Like totally ditching the support to Intels starting with the releases on next March 1st (or better April?).

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
    1. Re:Is there any other option, Linus? by gravewax · · Score: 4, Interesting

      and how exactly does that do anything at all to improve the situation? or are you suggesting Open source hardware would somehow be magically design flaw free?

    2. Re:Is there any other option, Linus? by cas2000 · · Score: 4, Interesting

      by doing this it magically becomes the operating system's fault that the CPUs are insecure by design.

      "we documented how OS vendors could turn on the secure mode and cripple performance at the same time. they chose not to use it, so any security flaws are their fault".

    3. Re:Is there any other option, Linus? by hcs_$reboot · · Score: 5, Interesting

      The reason practically every processor has the same issues is because the same optimizations we used to make processors faster had the same fundamental design error.

      I mean, either someone designed the core branch predictor block and everyone worldwide copied it for every processor, or everyone implemented it differently, yet it has the same Spectre flaw, implying that the flaw is inherent in the way branch predictors work.

      No. The fix is to not read from memory into the CPU cache during the speculative execution when that block of data is not there already. Changing this in the CPUs core would solve both Spectre and Meltdown, at a reasonable cost (would not defeat much current optimizations).

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    4. Re: Is there any other option, Linus? by LordKronos · · Score: 5, Interesting

      Even invalidating the loaded cache pages isn't necessarily sufficient. Because the act of loading one page means the flushing of another page, it may be possible to then do spectre in the opposite direction...preload the cache and if any preloaded pages become slower to access then you can determine the branch predictor caused them to be flushed. At least in theory....in practice that becomes more difficult in a multiprocess environment where other processes could be responsible for flush,but I certainly wouldnt want to predict it isn't possible.

      So the full solution may need to be more complex. Just like the CPU includes more registers than the architecture specifies so it can do scrap work in this extra registers and then roll it back without affecting the real registers,the CPU may need extra cache pages so that it can load a page and then flush it without having lost any of the previously loaded pages.

      Or alternatively, approach the problem from the opposite perspective. The problem is caused not just because of speculative execution but also because (for performance reason) the OS leaves all process memory mapped into every processes address space and the uses permission to try and make that memory unavailable. The other fix is to find a way to redesign virtual memory so that other processes memory is NOT mapped into each others memory space and is thus truely inaccessible. But that may be an even more difficult solution to implement

    5. Re:Is there any other option, Linus? by Antiocheian · · Score: 3, Interesting
      There's no bloat in FFMPEG but that's the exception rather than the rule per Niedermayer's words:

      Old school: Use the lowest level language in which you can solve the problem conveniently.
      New school: Use the highest level language in which the latest supercomputer can solve the problem without the user falling asleep waiting.

      I think you'll agree that the new school is the majority.

    6. Re:Is there any other option, Linus? by Anonymous Coward · · Score: 2, Interesting

      I also want fast machines for scientific data processing. We have better languages and better optimizers than were practical two decades ago and I like it.

      And if Fran Allen's opinion is of any weight, you actualy would have had the same languages and optimizers those two decades ago if it weren't for C ruining the compiler landscape in the 1970s.

       
      ASM programmer here.

      Modern CPU is so totally fucked up - with layers upon layers of abstraction the task of real optimization becomes more and more hopeless.

    7. Re:Is there any other option, Linus? by mwvdlee · · Score: 5, Interesting

      Linus seems to (begrudingly) accept the need for a temporary fix and there is already a temporary fix that works for current CPU's.
      The problem is Intel calling it a permanent fix and implying that every future CPU will be unsecure by default unless the OS flips a switch.
      That way Intel can blame any performance issues on the OS and still pretend their CPU is fast, even though it isn't when running in the secure mode that no sane person would ever use.

      How about a car analogy:

      Imagine all cars have two bugs in the gearbox that trigger on putting it in reverse certain ways.

      But 1 makes a dashboard light blink one time.
      All car manufacturers have this bug, and they all fixed it when found.

      Bug 2 makes your car explode.
      AMD and ARM knew about this and fixed it. It made their cars a bit slower, but atleast it wouldn't explode.
      Intel knew about it too, but they choose to ignore it. Their cars are a bit faster because of this.
      Intel fixed this by sending out a widget that stops the car from exploding, this widget does make Intel cars go slower.
      The widget doesn't fix it automatically, though! The driver has to switch the widget on every time he starts the car. If the driver doesn't switch the widget on, putting the car in reverse will still make it explode.
      Intel also says that this is how all future cars will be prevented from exploding; by adding this widget to every future car and requiring the driver to switch it on; it'll always be in "explode-on-reverse" mode by default.
      Intel does get to claim their car is faster by default though. Just don't put it in reverse.

      As a bonus analogy; Intel claims both bugs are the same because they are both triggered by the same action, so therefore all car manufacturers are vulnerable to the exploding car bug.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  2. What is going on here...? by DrTJ · · Score: 4, Interesting

    From the email correspondance; Linus says to mr Woodhouse:

    "As it is, the patches are COMPLETE AND UTTER GARBAGE.

    They do literally insane things. They do things that do not make
    sense. That makes all your arguments questionable and suspicious. The
    patches do things that are not sane.

    WHAT THE F*CK IS GOING ON?"

    In the post, Linus is not addressing much technical detail (just mentions "garbage MSR writes" whatever than means), but his bullshit detector goes off big time.

    It is clear that he thinks the patches are sub-optimal, but that in itself cannot be the first time in Linux kernel history. There seems to be something else behind, or why would he ask "WHAT THE F*CK IS GOING ON" question? Why does he play the "questionable" and "suspicious" card? Does he think that there is something shady going on from Intel, that goes beyond the technical stuff?

    Can anyone shed some light?

  3. Re:Linus Haiku by AmiMoJo · · Score: 4, Interesting

    So I'm gonna submit his email as evidence in my small claims court action against Intel.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  4. Don't Bet On Malice When Stupidity Will Do? by ytene · · Score: 5, Interesting

    You make some really interesting points around retpoline, but I wonder if this latest from Intel fails to account for this because they are being disingenuous, or because they continue to be a bunch of idiots?

    We're seeing similar problems to this with other very-long-established technologies, such as Windows [with Windows 10]. Things that have worked for decades up until W10 are breaking, or they are breaking in new and frustrating ways.

    For example, I have a triple-screen setup and using removable SSDs via a caddy unit, I can boot my computer into 2 different W10 instances, as well as multiple Linux builds. The 2 W10 instances behave in completely different ways, despite being set up, by me, with EXACTLY the same approach [scripted]. On one of them the Task Bar keeps relocating itself around the desktop, on the other it remains static. I've been back-and-forth with Microsoft and they don't know why...

    At the root of the problem I suspect they have changed something in W10, written by someone no longer at the company, possibly poorly documented and possibly with unknown consequences.

    Maybe Intel are having similar issues... A decision was made a very long time ago to do something insecure and stupid with speculative execution, but the person who made that decision is no longer with the company, so a new Team are trying to fix it and simply don't know what they're doing...

    I honestly don't know what the source is, but I do know that I am seeing "existing" functionality break with much greater frequency on core platforms like this. It just smacks of carelessness...

  5. Re:and your solution is? by drinkypoo · · Score: 5, Interesting

    we must fix things with what is possible, no matter how ugly.

    Intel went straight to ugly, and did not satisfactorily explore the realm of the possible. Linus perceived this, and announced it to the world. The ball is now in Intel's court. They can be responsible and competent, or the whole world can know that they are the fuckups that they are. It's their call.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"