ARM TrustZone Hacked By Abusing Power Management (acolyer.org)

← Back to Stories (view on slashdot.org)

ARM TrustZone Hacked By Abusing Power Management (acolyer.org)

Posted by EditorDavid on Sunday September 24, 2017 @01:30AM from the in-the-chips dept.

"This is brilliant and terrifying in equal measure," writes the Morning Paper. Long-time Slashdot reader phantomfive writes: Many CPUs these days have DVFS (Dynamic Voltage and Frequency Scaling), which allows the CPU's clockspeed and voltage to vary dynamically depending on whether the CPU is idling or not. By turning the voltage up and down with one thread, researchers were able to flip bits in another thread. By flipping bits when the second thread was verifying the TrustZone key, the researchers were granted permission. If number 'A' is a product of two large prime numbers, you can flip a few bits in 'A' to get a number that is a product of many smaller numbers, and more easily factorable.
"As the first work to show the security ramifications of energy management mechanisms," the researchers reported at Usenix, "we urge the community to re-examine these security-oblivious designs."

37 of 60 comments (clear)

Min score:

Reason:

Sort:

Every time by DontBeAMoran · 2017-09-24 01:32 · Score: 4, Funny

Every time I hear about security, viruses and hacks, it's done via "opcodes", "registers" and "bits". Isn't it time we design more secure processors without these flaws?

--
#DeleteFacebook
1. Re:Every time by jellomizer · 2017-09-24 02:28 · Score: 2
  
  Normally at this level for the hack we start to cross the line from the digital to the analog. While most of us coders just worry about 0 and 1, on the processor we are looking at a values between a threshold, where wires are so close that a power change could cause a little static arch that in theory can change a bit.
  However these hacks normally need to be times perfectly and with intimate knowledge on what is going on at that time. Such a hack would most likely cause a program to fail, or some bad data to be processed, which is bad, however no worse then the bugs in most applications or OS, or just generic hardware failure.
  While I could see AMD would want to fix this, I don't see it currently as a major concern for security, as if the hacker was to get to that level, they would have access to a lot more on the PC.
  
  --
  If something is so important that you feel the need to post it on the internet... It probably isn't that important.
2. Re:Every time by elrous0 · 2017-09-24 02:30 · Score: 5, Funny
  
  Better yet, don't name your product with a name that can later become ironic, like "TrustZone." Try naming your product "ShitStorm" or "ClusterFuck" instead. That way, when it gets hacked or turns out to be buggy as hell, you can say "What did you expect? We told you so upfront."
  
  --
  SJW: Someone who has run out of real oppression, and has to fake it.
3. Re:Every time by glitch! · 2017-09-24 04:32 · Score: 1
  
  Better yet, don't name your product with a name that can later become ironic, like "TrustZone." Try naming your product "ShitStorm" or "ClusterFuck" instead.
  
  "With a name like that, you know it HAS to be good!" Like that old Saturday Night Live skit where they come up with bad names for the jelly. "Fruckers! You know it must be good!" Followed by "Monkey Pus!", "Painful Rectal Itch!", and "Death Camp! Look for the barbed wire on the label!" Then "10,000 Nuns and Orphans! What's wrong with that? They were eaten by rats!"
  
  --
  A dingo ate my sig...
4. Re:Every time by TechyImmigrant · 2017-09-24 04:46 · Score: 1
  
  Other large semiconductor companies seem to be able to implement a secure enclave structure with dynamic voltage wobbling and managed to take fault injection seriously and they don't have these problems. It's heavy lifting to do a proper job, but in the case of RSA, it really isn't. Just sprinkle in some TMR and integrity testing with maybe a rail monitor and you will be good. I wonder why that isn't a part of TrustZone as standard. It should be.
  You are right. Management can't see security problems when you're building it and unless they've had some bitter experience, they don't know how to cost it.
  
  --
  I should use this sig to advertise my book ISBN-13 : 978-1501515132.
5. Re:Every time by gtall · 2017-09-24 06:00 · Score: 1
  
  I think if we could just have pink unicorns, we could ask their magical advice on how to design new processors.
6. Re:Every time by arglebargle_xiv · 2017-09-24 06:48 · Score: 1
  
  Actually, TrustZone is an excellent architecture.
  TrustZone is a terrible architecture. It started as a hash-for-secure-boot and then had more and more crap bolted onto it without rhyme or reason as the marketing folks sold it as all things to all people, with most of what was bolted on only partly finished or debugged, if that. The OPs suggestion that it be rebranded as ClusterFuck isn't too far off the target, because that's what it's turned into.
Easy fix by Anonymous Coward · 2017-09-24 01:39 · Score: 4, Insightful

Don't allow non operating system code to muck with the system clock. Problem solved. Why would this functionality ever be exposed? This is something that non-OS code should NEVER be able to do.
1. Re: Easy fix by sound+vision · 2017-09-24 04:03 · Score: 1
  
  Even if the program isn't given direct access to change the speed and voltage, it can trigger those changes indirectly.
2. Re:Easy fix by thegarbz · 2017-09-24 06:42 · Score: 1
  
  Non OS code doesn't need that capability for non OS code to actually perform those actions via proxy.
3. Re:Easy fix by gweihir · 2017-09-24 06:52 · Score: 1
  
  And then somebody hacks the OS and can compromise the Trust Zone anyways. No, what we need to do is secure the OS, because this is just one more case where anybody that owns the OS owns everything.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
4. Re:Easy fix by BadDreamer · 2017-10-02 00:33 · Score: 1
  
  Problem not solved, because breaking TrustZone means breaking the machine BENEATH the OS level.
Re:Would the Rust programming language help? by Anonymous Coward · 2017-09-24 01:56 · Score: 1

No, this is a hardware design flaw. It really has nothing to do with security other than it causing a security issue as a byproduct. Really there is no reason raising and lowering the voltage should flip bits at all other than someone made a big boo boo in the design.
You're kidding right? by Ayano · 2017-09-24 01:57 · Score: 3, Interesting

These Goldilocks voltages will vary by small margins.. too small to be accurately predicted for an actual attack.

TFA tries to make the argument that this physical hack can be done remotely despite the highly controlled conditions by relying on the power and energy management utilities...

Now i've got news as an embedded developer, that sh*t isn't accurate for anything this sensitive.

--
I don't read AC
1. Re:You're kidding right? by Anne+Thwacks · 2017-09-24 05:00 · Score: 5, Interesting
  
  This is the same, or very similar, to an Intel bug described about a month ago:
  The issue in both cases is either:
  a) The device can be set to operate under conditions that are known to cause it to be unreliable (be out of spec)
  b) The device fails to operate reliably when operated within spec
  If it is (a) then perhaps the manufacturer should test devices more thoroughly - and then blow fuses to limit operation within spec. If it is (b) the manufacturer should test the devices more thoroughly.
  You may know that (eg) Intel sell processors "locked" to prevent over clocking. This prevents (a). It obviously fails to prevent (b) either the manufacturer chose not to lock the device as (or the buyer chose not to buy locked ones) and the suer was "free" to use the devices out of spec, or the article describes devices where the tests were inadequate.
  In reality, device performance is not consistent within a batch, and devices are sorted for performance - hence processors with different speed and power options. This has been true since the beginning of TTL. As devices have higher part count (see Moore's law) they have a higher probability of failure - since there are more failure modes, there is a much higher time-to-test. Time to test maps directly to device cost. Because time-to-test adds to cost, semiconductor devices are not tested 100%*: some parameters are, and others are only sampled to ensure that the tests are identifying the duds. The problem here is that the parameters tested by sampling may not be as reliably characterised as they are believed to be. If you assume that (for example) all static ram cells in the chip have essentially the same logic levels and speeds within a certain margin, and that margin has a wider spread between devices under circumstances that have not been identified, then testing some sample registers won't tell you that others are not reliable on chips with this unknown and unidentified problem.
  Complexity does not scale linearly with transistor count - it is partly that, but it also scales with number of modules, module complexity, and number of interfaces between modules (hardware equivalent of API's not API instances). A more complex CPU has more of all three of these factors. Any way you look at it, a more complex chip will be more likely to fail in modes that are hard to identify.
  About 15 years ago, I was part of a team that identified a problem in a CPU of fairly low complexity caused by data leakage between pipeline stages in a processor used in safety critical applications (AFAIK, no one actually died as a result of these failures). These failure modes are very hard to find. This one took about a man-year of very expensive engineers using very expensive equipment.
  I predict that Moore's law will eventually be hit the Thwacks Barrier: Processor complexity will reach the stage where a processor cannot be adequately tested within a timescale that makes it worth producing.
  I therefore hereby, formally pronounce that testability will be the barrier that ends Moore's law.
  *Some /. users who are old enough to afford lawns may recall the national Semiconductors Mil-spec scandal: devices were sold as 10% tested when in fact they were only sampled, because the failure rate was "very low". No Aircraft carriers or space rockets were actually lost, but crimes were found to be committed anyway.
  
  --
  Sent from my ASR33 using ASCII
2. Re:You're kidding right? by Zorpheus · 2017-09-24 05:11 · Score: 1
  
  You can run any number of tries though, until you manage to change a bit.
  I don't know, but you can probably also use any number of tries of getting a corrupted trustzone key?
3. Re:You're kidding right? by izzo+nizzo · 2017-09-25 15:35 · Score: 1
  
  This is fascinating. I'm curious if the bulk of the testing techniques are things that could eventually be automated. If AI could bite off some of the burden, perhaps the chips could still be tested in a feasible time frame.
Re:Would the Rust programming language help? by radish · 2017-09-24 02:00 · Score: 5, Informative

Not in this case. Rust (and similar programming approaches) prevent accidental interference between threads (of the same application) at the code execution layer - i.e. they prevent bugs due to programming errors. This attack is happening at the hardware level - the threads in question could be completely different applications and could be written in any language.

--
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Sefcurify by JBMcB · 2017-09-24 02:14 · Score: 2

It would all be more secure if there were a backdoor engineered into the design so the government could have unfettered access to our data. You know, to make sure it's secure.

--
My Other Computer Is A Data General Nova III.
Targeted by JBMcB · 2017-09-24 02:18 · Score: 1

It would probably work for a *very* targeted attack. A specific rev of a specific device running a specific OS.
Useful for spooks, not much for anyone else. There were a bunch of these kinds of hacks in the NSA leaks - a MITM attack given a specific version of Apache, OpenSSL, and a specific version of a particular web browser.

--
My Other Computer Is A Data General Nova III.
1. Re: Targeted by Entrope · 2017-09-24 02:26 · Score: 1
  
  The voltage variations in question are driven by the random defects in the silicon and in the fabrication process, not so much the CPU design or the OS (or even firmware) running on the chip.
2. Re: Targeted by Anonymous Coward · 2017-09-24 03:28 · Score: 1
  
  But it's the same kind of vulnerability where you take advantage of a race condition and multithreading. In software, you set up some handlers to catch the segmentation faults, and whatever. But just keep trying again and again until you get lucky.
Broken Hardware by Anonymous Coward · 2017-09-24 02:26 · Score: 1

If the power management can change the state of the processing engine then the power management methodology is broken. There should be no way to flip bits or change any of the processing state by manipulating the power state. That is is possible shows a serious flaw in the design.
Re:Would the Rust programming language help? by jellomizer · 2017-09-24 02:33 · Score: 1

No, Rust isn't a magical device that makes all your computers secure.
It does help enforce better coding practices to make your code more secure.
However on some level the point of the programming language is to interact with the system hardware.
a Mutable data type will prevent threads from changing the data. However it doesn't stop the CPU from changing the data in the value. Because something needs to clear the memory when the variable is no longer needed (such as leaving the nest or program end)

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Entrope is an idiot by Anonymous Coward · 2017-09-24 02:49 · Score: 1

The voltage variations in question are driven by the random defects in the silicon and in the fabrication process, not so much the CPU design or the OS (or even firmware) running on the chip.
RTFM, idiot:
Thus any frequency or voltage change initiated by untrusted code inadvertently affects the trusted code execution.
Yes its Targeted by johnjones · 2017-09-24 03:11 · Score: 2

The claim that you can not manipulate the keys was made and clearly thats not the case... the team at Columbia University : Adrian Tang, Simha Sethumadhavan, and Salvatore Stolfo deserve credit for showing that was not always the case...
I wonder how many side attacks the PLA have...
john.jones.name
Re:Would the Rust programming language help? by Hognoxious · 2017-09-24 04:00 · Score: 1

Could it be stopped by making appropriate amendments to the Code of Conduct?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Rowhammer all over again by Dwedit · 2017-09-24 04:01 · Score: 2

This looks just like Rowhammer all over again. Flipping bits by messing with something nearby.
1. Re:Rowhammer all over again by viperidaenz · 2017-09-24 07:53 · Score: 1
  
  It's flipping bits by gaining root access, profiling the system, crashing it many times in the process, then mess with something nearby.
2. Re:Rowhammer all over again by swillden · 2017-09-25 00:05 · Score: 1
  
  It's flipping bits by gaining root access, profiling the system, crashing it many times in the process, then mess with something nearby.
  True, but that doesn't mean it's not bad.
  The whole point of TrustZone and similar technologies is to provide a place for computations that you wish to remain secure even in the event of complete compromise of the main operating system. Note that I'm not claiming that the attack is practical, it may or may not be sufficiently automatable to carry out remotely, on a large number of devices. That's for future research to determine. But it does make me nervous (my main project for the last four years is an Android subsystem that runs in TrustZone, SGX, etc.).
  Well, I should say it would make me nervous if there weren't much easier ways to attack TrustZone already, due not to weaknesses in TrustZone but to the operating systems that run there.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
If that's all... by johannesg · 2017-09-24 04:54 · Score: 1

I'm actually more terrified by the notion that activities in one core can cause bits to flip for another, completely randomly. We have a _lot_ of important stuff riding on the correct calculations happening in all those CPUs, worldwide, and the idea that you can pretty trivially cause random results is not a happy one.
Not too hard to fix by Anonymous Coward · 2017-09-24 05:15 · Score: 1

Making things secure is much harder than breaking into things. Given that, this one is an easy fix. The hypervisor controlling security can make sure the security states are stable before granting access (Accross dvfs variations) The security software can monitor votalge varations beyond allowed and lock down the system/user program ( red alert)
Btw voltage can be varied from outside without power management commands to bypass pm control software. So a best solution is a voltage monitors (most chips have this)
At last , anytime you mess with dfvs beyond what the hardware was designed for , you crash the system most of the time. This is not reliable hack beyond lab controlled environment.
Re:Would the Rust programming language help? by arglebargle_xiv · 2017-09-24 07:00 · Score: 2

No, you don't understand. You're thinking of Rust as a programming language when in fact it's a religion. Every time there's some post about bugs, flaws, or bad code, the Rust flakes turn up to enlighten the heathens with their religion/language with its guarantee of perfect, error-free code and operation. All hail the mighty Rust! Lead us into the light of your perfection!
Not quite so simple. by viperidaenz · 2017-09-24 07:52 · Score: 1

You need software access to the registers that control the core voltage regulators.
So you first need to gain root access.
They changed the DVFS tables to make the soc run outside it's operating areas.
They had to profile the DVFS operating points for the specific device they used to find the right values to used. The profiling causes device reboots or freezes. Not something you can do without being noticed.
Step 1: probe DVFS tables, profile system to find points where it causes bit flips without rebooting or freezing.
Step 2: use performance counters to profile the victim code to find exactly when you need to trigger a fault
Step 3: load new values to DVFS table
Step 4: trigger a spin loop at the precise time in a core that shares the same clock-voltage values as the core the victim thread is running in, causing the system to change to the altered voltage/frequency point.
Step 5: profit?
The easiest way to mitigate this is to implement power saving better. Separate all the core frequencies and voltages, like Intel does already. The way it's done in ARM chips seems wasteful to me. why would you raise the frequency and voltage of 4 cores when you're only needing one?
You could also not allow the performance counters to be used to profile code running at a higher privilege level.
Is anyone beginning to get the idea by Sqreater · 2017-09-25 00:04 · Score: 1

that you can't have computer security yet? That it is not possible? That what a man can make, a make can take apart?

--
E Proelio Veritas.
Re:RTFM by Anonymous Coward · 2017-09-25 01:32 · Score: 1

RTFM indeed, it isn't manipulated by the order of the executed instructions but by telling the dvfs system to change the voltage and frequency. So limiting access to the DVFS should mitigate the issue enough such that they'll already have root access on the system before they can mount the attack.
Semi-practical vs entrenched flaws by EndlessNameless · 2017-09-25 03:14 · Score: 1

Apps cannot be granted permission to control DVFS, which is necessary to induce the faults, but they can manipulate it because Android responds to the application's load/behavior.
However, the application has no specific knowledge of the overall system load and therefore it cannot consistently induce faults. The scenario in a lab is probably far, far easier than real life---it eliminates the effect of other apps, network state changes, etc on the power state.
Very clever proof of concept, but it will take a Herculian effort to turn this into an effective attack in the wild.
Fixing the problems will require all parties. There are elements under the control of ARM directly as well as the SoC designers. Android may be able to mitigate the issues at the OS level---but I assume that would penalize battery life, system performance, or both.

--

---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.