Linus Torvalds Calls Intel Patches 'Complete and Utter Garbage' (lkml.org)

← Back to Stories (view on slashdot.org)

Linus Torvalds Calls Intel Patches 'Complete and Utter Garbage' (lkml.org)

Posted by EditorDavid on Sunday January 21, 2018 @07:44PM from the torn-up-by-Torvalds dept.

An anonymous reader writes: On the Linux Kernel Mailing List, Linus Torvalds ended up responding to a long-time kernel developer (and former Intel engineer) who'd been describing a new microcode feature addressing Indirect Branch Restricted Speculation "where a future CPU will advertise 'I am able to be not broken' and then you have to set the IBRS bit once at boot time to *ask* it not to be broken."

Linus calls it "very much part of the whole 'this is complete garbage' issue. The whole IBRS_ALL feature to me very clearly says 'Intel is not serious about this, we'll have a ugly hack that will be so expensive that we don't want to enable it by default, because that would look bad in benchmarks'. So instead they try to push the garbage down to us. And they are doing it entirely wrong, even from a technical standpoint. I'm sure there is some lawyer there who says 'we'll have to go through motions to protect against a lawsuit'. But legal reasons do not make for good technology, or good patches that I should apply."
Later Linus says forcefully that these "complete and utter garbage" patches are being pushed by someone "for unclear reasons" -- and adds another criticism. The whole point of having cpuid and flags from the microarchitecture is that we can use those to make decisions. But since we already know that the IBRS overhead is huge on existing hardware, all those hardware capability bits are just complete and utter garbage. Nobody sane will use them, since the cost is too damn high. So you end up having to look at "which CPU stepping is this" anyway. I think we need something better than this garbage.

17 of 507 comments (clear)

Min score:

Reason:

Sort:

Is there any other option, Linus? by aglider · 2018-01-21 19:52 · Score: 5, Interesting

You are right, Linus, as usual.
But I'd prefer the Linux Kernel Development team to push a complete proposal on the table.
Like totally ditching the support to Intels starting with the releases on next March 1st (or better April?).

--
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
1. Re:Is there any other option, Linus? by gravewax · 2018-01-21 20:19 · Score: 4, Interesting
  
  and how exactly does that do anything at all to improve the situation? or are you suggesting Open source hardware would somehow be magically design flaw free?
2. Re: Is there any other option, Linus? by Anonymous Coward · 2018-01-21 20:47 · Score: 0, Interesting
  
  I wasnâ(TM)t thinking ditch Intel support as I see these bugs as being kernel design problem. I made the same kernel design âoedecisionsâ in my own systems.
  The problem is simple :
  A) do you want the performance gains from the added intelligence in the CPU?
  B) Do you want a more secure CPU which is MUCH slower.
  No, AMD has the same problem... at least across threads. And encrypting memory across threads might be kinda interesting but will make modern programming a sheer terror.
  The solution to the problem is that the OS flushes the pipeline on context switches. This comes at a high cost. So the kernel can instead selectively implement security that identifies whether the task switch justifies the flush... which it should... but also comes at a high cost.
  Now, the solution can be much better if there is a multipronged approach.
  1) in virtualized environments, flush the pipeline when switching between security domains... I.E. VMs, containers...
  2) In desktop environments, depend on :
  A) Software IPS that identifies malicious code based on signatures (Windows SmartScreen for example)
  B) Requires apps to be signed to run (Windows 10S/Mac OS)
  C) Require JITs to ensure that code canâ(TM)t be generated that can exploit the pipeline. This is easily done.
  Linus is bitching about Intel... but I see this as an excellent opportunity to make our systems better by improving our coding practices.
  If the suggestions I made above were implemented, the flag heâ(TM)s complaining about would be merited as the OS and the software would be hardened with little or no performance impact.
  Flushing the pipeline between VMs/Containers will still come at a cost, but since VMs and Containers are generally a bad idea anyway, and we have always solved VM/container problems by throwing more computing power and bandwidth at the problem, this doesnâ(TM)t bother me.
  Also consider that if a data center is designed properly for a small organization, three chassis of three micro-blades (or 9 NUCs and 5 gigabit switches) should be more than enough to move everything back home and not fear the cloud. Then, we could stop fearing bugs altogether.
  These vulnerabilities are far deeper than the CPU. Intel has a bandaid in place until the software is improved enough to not need it. So... Linus can quit bitching and start working on a real solution. The Chrome, Edge and Mozilla guys have already started.
3. Re:Is there any other option, Linus? by cas2000 · 2018-01-21 21:23 · Score: 4, Interesting
  
  by doing this it magically becomes the operating system's fault that the CPUs are insecure by design.
  "we documented how OS vendors could turn on the secure mode and cripple performance at the same time. they chose not to use it, so any security flaws are their fault".
4. Re:Is there any other option, Linus? by hcs_$reboot · 2018-01-21 23:19 · Score: 5, Interesting
  
  The reason practically every processor has the same issues is because the same optimizations we used to make processors faster had the same fundamental design error.
  I mean, either someone designed the core branch predictor block and everyone worldwide copied it for every processor, or everyone implemented it differently, yet it has the same Spectre flaw, implying that the flaw is inherent in the way branch predictors work.
  No. The fix is to not read from memory into the CPU cache during the speculative execution when that block of data is not there already. Changing this in the CPUs core would solve both Spectre and Meltdown, at a reasonable cost (would not defeat much current optimizations).
  
  --
  Slashdot, fix the reply notifications... You won't get away with it...
5. Re: Is there any other option, Linus? by LordKronos · 2018-01-21 23:43 · Score: 5, Interesting
  
  Even invalidating the loaded cache pages isn't necessarily sufficient. Because the act of loading one page means the flushing of another page, it may be possible to then do spectre in the opposite direction...preload the cache and if any preloaded pages become slower to access then you can determine the branch predictor caused them to be flushed. At least in theory....in practice that becomes more difficult in a multiprocess environment where other processes could be responsible for flush,but I certainly wouldnt want to predict it isn't possible.
  So the full solution may need to be more complex. Just like the CPU includes more registers than the architecture specifies so it can do scrap work in this extra registers and then roll it back without affecting the real registers,the CPU may need extra cache pages so that it can load a page and then flush it without having lost any of the previously loaded pages.
  Or alternatively, approach the problem from the opposite perspective. The problem is caused not just because of speculative execution but also because (for performance reason) the OS leaves all process memory mapped into every processes address space and the uses permission to try and make that memory unavailable. The other fix is to find a way to redesign virtual memory so that other processes memory is NOT mapped into each others memory space and is thus truely inaccessible. But that may be an even more difficult solution to implement
6. Re:Is there any other option, Linus? by Antiocheian · 2018-01-22 01:06 · Score: 3, Interesting
  
  There's no bloat in FFMPEG but that's the exception rather than the rule per Niedermayer's words:
  
  Old school: Use the lowest level language in which you can solve the problem conveniently.
  New school: Use the highest level language in which the latest supercomputer can solve the problem without the user falling asleep waiting.
  
  I think you'll agree that the new school is the majority.
7. Re:Is there any other option, Linus? by Anonymous Coward · 2018-01-22 01:32 · Score: 2, Interesting
  
  I also want fast machines for scientific data processing. We have better languages and better optimizers than were practical two decades ago and I like it.
  And if Fran Allen's opinion is of any weight, you actualy would have had the same languages and optimizers those two decades ago if it weren't for C ruining the compiler landscape in the 1970s.
  
  ASM programmer here.
  Modern CPU is so totally fucked up - with layers upon layers of abstraction the task of real optimization becomes more and more hopeless.
8. Re:Is there any other option, Linus? by Anonymous Coward · 2018-01-22 02:48 · Score: 0, Interesting
  
  There's basically NO difference between a "desktop PC" and a "workstation". The names you're assigning are even stupid, since, last I checked, most work stations (where one would presumably find a "workstation" computer) include a desk, complete with a flat top... a desk-top.
  "Desktop" is a fantastic way to distinguish between a non-portable computer and a portable "laptop" one, but to draw a line between a "desktop" and a "workstation" is asinine.
  I build my own desktop computers, and the parts list is basically the same no matter the role: a case, mobo, CPU, some RAM, a storage device (HDD, SSD, whatever), and, optionally, a GPU. Maybe some other specialty stuff (I consider removable storage to be a specialty now), but those are the basics. Whether that mobo/CPU/GPU are optimized for one workload over another is a matter for consideration when selecting the parts, but there's no "workstation sauce" that makes one box a "workstation" and another box that lacks it is a "desktop".
  If all you're doing is browsing and frittering your life away on Facebook, get a chromebook, and don't spend more than $200. If you need computing power, whether for gaming or for work (or maybe for both), build a desktop.
  My work doesn't involve slinging GB of video around through various processing stages or anything. My work involves having anywhere from 3-10 separate projects ("solutions") open in Visual Studio 2008-2017 (multiple versions are needed, usually due to old production systems being built with something not supported past a certain version of VS), making parallel changes across all of them. Each instance of VS uses 150-ish MB of RAM without loading a project, and some of these projects, once loaded, use upwards of 1GB of RAM. With browsers being the memory hogs they've become lately, even 16GB is feeling a little tight these days. And any CPU sits at 25-30% load, pretty much non-stop doing background compilation and other stuff. But I don't usually bother with "workstation-class" parts for these machines, since the price premium is idiotic, and regular parts work just fine. Which is why the distinction between "desktop" and "workstation" is dumb.
9. Re:Is there any other option, Linus? by mwvdlee · 2018-01-22 03:05 · Score: 5, Interesting
  
  Linus seems to (begrudingly) accept the need for a temporary fix and there is already a temporary fix that works for current CPU's.
  The problem is Intel calling it a permanent fix and implying that every future CPU will be unsecure by default unless the OS flips a switch.
  That way Intel can blame any performance issues on the OS and still pretend their CPU is fast, even though it isn't when running in the secure mode that no sane person would ever use.
  How about a car analogy:
  Imagine all cars have two bugs in the gearbox that trigger on putting it in reverse certain ways.
  But 1 makes a dashboard light blink one time.
  All car manufacturers have this bug, and they all fixed it when found.
  Bug 2 makes your car explode.
  AMD and ARM knew about this and fixed it. It made their cars a bit slower, but atleast it wouldn't explode.
  Intel knew about it too, but they choose to ignore it. Their cars are a bit faster because of this.
  Intel fixed this by sending out a widget that stops the car from exploding, this widget does make Intel cars go slower.
  The widget doesn't fix it automatically, though! The driver has to switch the widget on every time he starts the car. If the driver doesn't switch the widget on, putting the car in reverse will still make it explode.
  Intel also says that this is how all future cars will be prevented from exploding; by adding this widget to every future car and requiring the driver to switch it on; it'll always be in "explode-on-reverse" mode by default.
  Intel does get to claim their car is faster by default though. Just don't put it in reverse.
  As a bonus analogy; Intel claims both bugs are the same because they are both triggered by the same action, so therefore all car manufacturers are vulnerable to the exploding car bug.
  
  --
  Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
10. Re:Is there any other option, Linus? by Khyber · 2018-01-22 04:11 · Score: 1, Interesting
  
  "AMD is not vulnerable to MELTDOWN"
  Actually, attacks on encrypted memory are quite possible. DPA, baby. Re-write meltdown just a little bit and I could make it work on AMD systems.
  
  --
  Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
and your solution is? by Anonymous Coward · 2018-01-21 20:09 · Score: 1, Interesting

So Linus what are you proposing as the solution? or this just yet another useless rant. Respect everything you have done but Christ sometimes you go on with utter bullshit. Yes we all know the intel situation is shit, yes we all know the patches are fucking awful with huge performance impacts, but what other option have you come up with to avoid it?
1. Re:and your solution is? by drinkypoo · 2018-01-22 00:55 · Score: 5, Interesting
  
  we must fix things with what is possible, no matter how ugly.
  Intel went straight to ugly, and did not satisfactorily explore the realm of the possible. Linus perceived this, and announced it to the world. The ball is now in Intel's court. They can be responsible and competent, or the whole world can know that they are the fuckups that they are. It's their call.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
What is going on here...? by DrTJ · 2018-01-21 20:21 · Score: 4, Interesting

From the email correspondance; Linus says to mr Woodhouse:
"As it is, the patches are COMPLETE AND UTTER GARBAGE.
They do literally insane things. They do things that do not make
sense. That makes all your arguments questionable and suspicious. The
patches do things that are not sane.
WHAT THE F*CK IS GOING ON?"
In the post, Linus is not addressing much technical detail (just mentions "garbage MSR writes" whatever than means), but his bullshit detector goes off big time.
It is clear that he thinks the patches are sub-optimal, but that in itself cannot be the first time in Linux kernel history. There seems to be something else behind, or why would he ask "WHAT THE F*CK IS GOING ON" question? Why does he play the "questionable" and "suspicious" card? Does he think that there is something shady going on from Intel, that goes beyond the technical stuff?
Can anyone shed some light?
Re:Linus Haiku by AmiMoJo · 2018-01-21 20:35 · Score: 4, Interesting

So I'm gonna submit his email as evidence in my small claims court action against Intel.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Obligatory: Intel CPU Backdoor Report (Jan 1 2018) by Anonymous Coward · 2018-01-21 20:38 · Score: 1, Interesting

Change log:
2018/01/01 - Added 14 Useful Links. Disable Intel ME via undocumented NSA "High Assurance Platform" mode with me_cleaner, Blackhat Dec 2017 Intel ME presentation, Intel ME CVEs (CVSS Scored 7.2-10.0)
Intel CPU Backdoor Report
The goal of this report is to make the existence of Intel CPU backdoors a common knowledge and provide information on backdoor removal.
What we know about Intel CPU backdoors so far:
TL;DR version

Your Intel CPU and Chipset is running a backdoor as we speak.
The backdoor hardware is inside the CPU/Bridge and the backdoor firmware (Intel Management Engine) is in the chipset flash memory.

30C3 Intel ME live hack:
[Video] 30C3: Persistent, Stealthy, Remote-controlled Dedicated Hardware Malware
@21:43, keystrokes leaked from Intel ME above the OS, wireshark failed to detect packets.
[Quotes] Vortrag:
"the ME provides a perfect environment for undetectable sensitive data leakage on behalf of the attacker".
"We can permanently monitor the keyboard buffer on both operating system targets."

Decoding Intel backdoors:
The situation is out of control and the Libreboot/Coreboot community is looking for BIOS/Firmware experts to help with the Intel ME decoding effort.
If you are skilled in these areas, download Intel ME firmwares from this collection and have a go at them, beware Intel is using a lot of counter measures to prevent their backdoors from being decoded (explained below).

Backdoor removal:
The backdoor firmware can be removed by following this guide using the me_cleaner script.
Removal requires a Raspberry Pi (with GPIO pins) and a SOIC clip.
2017 Dec Update:
Intel ME on recent CPUs may be disabled by enabling the undocumented NSA HAP mode, use me_cleaner with -S option to set the HAP bit, see me_cleaner: HAP AltMeDisable bit.

Useful links (Added 2018 Jan 1):
Disabling Intel ME 11 via undocumented HAP mode (NSA High Assurance Platform mode)
me_cleaner: Set HAP AltMeDisable bit with -S option
Blackhat 2017: How To Hack A Turned Off Computer Or Running Unsigned Code In Intel Management Engine
EFF: Intel's Management Engine is a security hazard, and users need a way to disable it
Sakaki's EFI Install Guide/Disabling the Intel Management Engine
Intel ME bug storm: Hardware vendors race to identify and provide updates for dangerous Intel flaws.
CVE-2017-5689: An unprivileged network attacker could gain
Don't Bet On Malice When Stupidity Will Do? by ytene · 2018-01-21 23:21 · Score: 5, Interesting

You make some really interesting points around retpoline, but I wonder if this latest from Intel fails to account for this because they are being disingenuous, or because they continue to be a bunch of idiots?

We're seeing similar problems to this with other very-long-established technologies, such as Windows [with Windows 10]. Things that have worked for decades up until W10 are breaking, or they are breaking in new and frustrating ways.

For example, I have a triple-screen setup and using removable SSDs via a caddy unit, I can boot my computer into 2 different W10 instances, as well as multiple Linux builds. The 2 W10 instances behave in completely different ways, despite being set up, by me, with EXACTLY the same approach [scripted]. On one of them the Task Bar keeps relocating itself around the desktop, on the other it remains static. I've been back-and-forth with Microsoft and they don't know why...

At the root of the problem I suspect they have changed something in W10, written by someone no longer at the company, possibly poorly documented and possibly with unknown consequences.

Maybe Intel are having similar issues... A decision was made a very long time ago to do something insecure and stupid with speculative execution, but the person who made that decision is no longer with the company, so a new Team are trying to fix it and simply don't know what they're doing...

I honestly don't know what the source is, but I do know that I am seeing "existing" functionality break with much greater frequency on core platforms like this. It just smacks of carelessness...