Skip to content

Userspace privilege escalation vulnerability on Cortex M

High
ceolin published GHSA-3r6j-5mp3-75wr Nov 11, 2025

Package

zephyr (zephyr)

Affected versions

<= 4.2

Patched versions

None

Description

Exploit

System call entry on Cortex M (and possibly R and A, but I think not) has a race which allows very practical privilege escalation for malicious userspace processes.

Explanation

The bug being exploited here is a race on entry to the system call. ARM syscalls work mostly conventionally, by trapping to a SVC exception in the userspace process, then setting things up in the kernel such that the return from the exception goes into the system call handler with the low (nPRIV) bit of the CONTROL register clear, allowing privileged access. But the bug is that the return happens on the interrupted thread's stack. That is writable memory[1] to the interrupted thread, and also to any other thread in the same memory domain. And it stores the return address that the kernel is going to jump to having elevated privilege!

So the trick is that you (1) repeatedly take a system call exception from a low priority thread (2) arrange for a high priority thread to interrupt it in the critical spot where the kernel thinks it has written z_arm_do_syscall to the return PC but is still operating on the writable stack, (3) clobber that PC with an attack address.

In practice the target is very large. It takes a few dozen tries in qemu to hit the race.

And to be clear there's a little more complexity at work. This relies on the Cortex M "tail chaining" feature, which means that exception returns when PendSV has been flagged (by the interrupting attacker in this case) do not actually try to return directly to the pushed stack frame before taking the exception. And of course the action of PendSV is just to save off the interrupted PSP stack pointer without inspection. For this reason I think that the non-M CPUs are not vulnerable here, depsite sharing the same userspace entry code, because they won't run any other thread code before the original syscall returns.

(But that said: the idea of running unlocked on a mutable-to-attackers stack is just bonkers to me, and I'll bet anything there are more subtle flaws still lurking. I probably just caught the easy one.)

[1] Or can be writable. By default ARM disallows MPU access to thread stacks from outside the thread. This has always been a weird feature (unlike Unix/Linux, and even other Zephyr architectures), but I'm realizing now it may have been the result of an attempt to avoid exactly this vulnerability. But it's not much protection in practice: note the exploit needs to take the extra step of switching the stack pointer to a regular array and not a kernel stack object.

Fix Alternatives

There's a fix for this hidden[1] in the ARM Cortex M arch_switch() PR. It works by taking a lock on exit from the SVC in the exception handler and holding it all the way through entry to z_arm_do_syscall() in thread mode. This doesn't prevent the PendSV (which can't be masked by interupt locks), but it does prevent anything else (like the timer interrupt in this exploit) from preempting the thread, which is good enough. It then holds the lock until it has safely switched to the privilege stack. And it retakes it again before switching back to the userspace stack. This could be split from arch_switch() and applied as a backport, with a little work.

Other options are possible for legacy code too: the intra-thread-stack MPU protection could be hardened by checking for a valid SP (i.e. one inside the known stack bounds of the process) on exit from PendSV. Also CONFIG_BUILTIN_STACK_GUARD works to prevent this as long as the thread stacks are always physically above the .data section in memory (i.e. the stack swap trick the exploit uses would look like a stack overflow and cause a trap). Not sure if that's true for all platforms, but it is for most, I think.

But really the Right Fix here is to abandon the current design for system call entry. The kernel should arrange for the creation of a resumable stack frame on the privileged stack directly and return to that, not to the user stack. This seems like it might even be a simpler scheme, you could do much more of the logic in portable C code in the kernel handler instead of the assembly entry code (which is trifurcated into three separate instruction variants currently!). I'll give this a shot, time available.

[1] IMHO fairly cleverly, wrapped along with code to address a different, arch_switch-related race in exactly the same entry code.

Patches

main: #95101 #96850
4.2: #97306 #96014
4.1: #97305 #96015
3.7: #97313 #96030

For more information

If you have any questions or comments about this advisory:

embargo: 2025-11-10

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Local
Attack complexity
High
Privileges required
None
User interaction
None
Scope
Changed
Confidentiality
High
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:L/AC:H/PR:N/UI:N/S:C/C:H/I:H/A:H

CVE ID

CVE-2025-9408

Weaknesses

Privilege Context Switching Error

The product does not properly manage privileges while it is switching between different contexts that have different privileges or spheres of control. Learn more on MITRE.

Credits