NEWS Five years in the shadows, millions of users at risk. Intel disclosed a critical bug in the Linux kernel.

pinkman

BOSS
Staff member
ADMIN
LEGEND
ULTIMATE
SUPREME
MEMBER
BFD Legacy
Joined
Feb 3, 2025
Messages
2,253
Reaction score
19,078
Deposit
0$
The patch is already in 6.19 and will go into backports so that interrupts are disabled correctly in all scenarios.

1769247080844.png

A bug that had been lurking in the Linux kernel for years has been discovered in one of the system's most sensitive areas: page fault handling on x86 systems. This is the moment when the processor suddenly realizes that a program has accessed memory that is currently inaccessible, and the kernel must quickly and accurately figure out what to do next. It turns out that since 2020, this logic has contained a subtle but fundamental flaw: interrupts weren't always disabled as intended.

The fix has already been accepted into the Linux 6.19 branch, and it is planned to be ported to older stable releases. The initiator of this fix was Intel engineer Cedric Xing, who uncovered the problem in the page fault exception handling code and proposed a simpler, yet more robust, approach.

The crux of the story hinges on an old comment inside the do_page_fault() function for x86. For years, it explained that interrupts could be re-enabled when handling memory access faults, especially in scenarios involving errors at user addresses. However, it was explicitly acknowledged that it was nearly impossible to iterate through all possible exit paths and guarantee the correct interrupt state at every location: it would either require a "hellish" combinatorial repair or completely reversal of the logic.

But as it turned out, this comment was slightly incorrect, and so was the logic surrounding it. The problem wasn't limited to "errors at user addresses." The handler mixed two different concepts: the address range (user or kernel) and the context in which the access occurred (conventionally, user-level or kernel). These concepts are intuitively related, but in practice, they are not the same. There are situations when kernel addresses are accessed in user context, and then certain branches of processing can enable interrupts, although they must be disabled again on exit before control returns to the low-level exception handler.

One example that surfaced during analysis involved the __bad_area_nosemaphore() branch: it does attempt to restore the "correct" state by enabling and then disabling interrupts, but it doesn't do so everywhere and doesn't always do so consistently. This resulted in an asymmetry: depending on the execution path, interrupts could remain enabled where the opposite was expected.

The engineers concluded that trying to carefully patch up all the branches was pointless. Instead, they adopted what is generally considered the safest approach in such areas: repeatedly and unconditionally disabling interrupts in one specific location before returning to the low-level part of page fault processing. To achieve this, they removed the incomplete special case from the code that attempted to selectively resolve the problem and replaced it with a simple rule: no matter which address caused the fault, the interrupt state must be guaranteed to be restored to the expected state before exiting.

Interestingly, the root cause of the problem lies in changes introduced into the kernel during the Linux 5.8 merge window in 2020. The bug is now closed in the current branch, and according to plans, it will eventually make its way to supported stable releases. For users, this isn't a "speedup" or a significant new feature, but rather a more correct and predictable kernel behavior in rare but potentially dangerous situations where a small asymmetry can have a high price.
 
Top Bottom