Google's Project Zero Team Discovered Critical CPU Flaw Last Year (techcrunch.com)
An anonymous reader quotes a report from TechCrunch: In a blog post published minutes ago, Google's Security team announced what they have done to protect Google Cloud customers against the chip vulnerability announced earlier today. They also indicated their Project Zero team discovered this vulnerability last year (although they weren't specific with the timing). The company stated that it informed the chip makers of the issue, which is caused by a process known as "speculative execution." This is an advanced technique that enables the chip to essentially guess what instructions might logically be coming next to speed up execution. Unfortunately, that capability is vulnerable to malicious actors who could access critical information stored in memory, including encryption keys and passwords. According to Google, this affects all chip makers, including those from AMD, ARM and Intel (although AMD has denied they are vulnerable). In a blog post, Intel denied the vulnerability was confined to their chips, as had been reported by some outlets. The Google Security team wrote that they began taking steps to protect Google services from the flaw as soon as they learned about it.
> Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect.
By using the AND statement there, a casual reader might think these not bugs or flaws in their processors. A close logical reading reveals the only reason this statement is accurate is because the second half is presumed "incorrect".
(1) There seems to be two separate exploits which you need to dig into the reporting to work. The Register's coverage is quite good and explains it all. "MELTDOWN" seems to be the more problematic one, and affects Intel and ARM chips. "SPECTRE" seems less problematic and affects AMD chips as well.
(2) AMD affected or not? Google says yeah, AMD says nay. However the wording from the LKML list is that "AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against". I think this references that the kernel patch is targeted against MELTDOWN, which does not affect AMD chips (see point 1)
(3) Although everyone's kicking Intel down, the main problem is that no-one can really trust each other now. I know there is a claim of "defective by design", but a lot of things can be described that way if they aren't used in their intended manner. In a "sane" world there would be no malicious actors trying to exploit what seems like quite a clever trick relying on timings (not a chip designer/expert). I read a lot of issues with the web came about, due to the fact that when it was designed everyone on the internet trusted each other, so security against bad apples wasn't designed in. As things have been commercialised you can see the effects, to the point that the only sane way to browse is using ad blockers and no script.
My thoughts on people suing Intel are a bit conflicted. Probably based on US law they would lose, but my analogy is like blaming (insert car manufacturer here) for selling you a car which crashes only when someone throws stones at it. We need stronger laws and protections against the rise in hostile actors.
(4) It's interesting that the Google blog post couldn't wait for the embargo-ed deadline of 9th January. They and their customers must have been getting really spooked. I suspect that this was being worked on and known by multiple parties, and a bit of coordination would have been good rather than panic.
(5) It'll be interesting to see what happens with regards to performance - from my understanding the SPECTRE variants just needs code recompilation. Most home workloads should not be affected by the two exploits, however I think if you are I/O heavy then it may be an issues.
Interesting time indeed.
Basically :
AMD checks access rights first and if rejected nothing much happens.
Meaning no leaks from kernel information into user-space running software.
- Google only demonstrated a in user-space software accessing its own in user-space info.
- And by using some non standard settings, it's possible to give bytecode to that kernel, and that piece of in-kernel software will access its own in-kernel info. (But you're already on the other side of the kernel fence)
Nothing gets accross the kernel fence.
Intel checks access rights much later on. By that time quite a lot has happened (e.g.: things could have been loaded in the cache, etc.). By measuring those things, you can deduce information that you should not have access to.
It means that a user-space software could end up getting sensitive information that normally should stay in-kernel.
These subtle timings of cache enable you to get information accross the kernel fence into user-land.
To mitigate these, each time a user-land software calls into a kernel function (e.g.: filesystem access), the OS needs to flush all it's space from the accessible space. This comes at a big performance cost.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
There are three different attacks in the blog post by Google's Security team. The first one, for example, works as follows: it loads from a kernel memory address; this will generate an exception, but before the exception is generated (because the page permission check is delayed to improve performance) the subsequent instructions are executed speculatively. None of the following instructions will ever commit, but they can have a noticeable impact on the processor state, as follows: they speculatively execute a load, based on the contents of the position loaded from the kernel space. The load is issued (but not committed), what caches a given memory location. The specific location is based on one bit of the .
When the first load is detected to be illegal, the instructions in the pipeline are flushed, but (the following is the critical part) the cached address remains in L1. By timing a memory access to the corresponding address, they can infer one bit of the given kernel memory. By repeating this, they can subsequently infer the whole word, one bit at a time.
How can they solve this issue? I can only foresee two alternatives:
- Perform permission checks earlier in the pipeline, but this requires modifying the processor microarchitecture. AMD cores are not affected by this attack, so their uarch probably checks permissions before issuing the load.
- Completely or partially flush the contents of the cache after a processor miss-speculation. This is probably the solution being implemented in the patches being developed.
Note that miss-speculations are VERY frequent, since most of the execution of Out-of-Order processors is speculative to improve performance. This explains the VERY significant performance penalties caused by the patches.