AWS engineer reports PostgreSQL perf halved by Linux 7.0, fix may not be easy
Scope of the Regression
- Reported: PostgreSQL throughput dropped to ~50% on Linux 7.0 under certain conditions.
- Initially thought ARM64-only, but later reproduced on x86_64 once huge pages were disabled, so architecture was a red herring.
- Regression appears only in extreme configurations: very large shared memory (tens–hundreds of GB), many cores, 4K pages, no huge pages.
Huge Pages and Configuration
- Follow-up LKML posts indicate that enabling huge pages largely eliminates the regression.
- Several commenters argue that running large PostgreSQL instances without huge pages is already a “bad” or at least suboptimal configuration, especially with 100GB+ buffer pools.
- Concern raised that in containerized/cloud environments, DB operators may not control huge page settings, so “bad” configs are common in practice.
Preemption Changes in Linux 7.0
- Root issue is linked to the new PREEMPT_LAZY behavior and removal of PREEMPT_NONE as a choice.
- PostgreSQL uses spinlocks in shared memory; with 4K pages, minor page faults inside critical sections now get preempted more often, causing lock-holder preemption and severe contention.
- Huge pages reduce the frequency of minor faults, making the effect much smaller.
- It’s noted that dynamic preemption knobs (PREEMPT_DYNAMIC) may allow some kernels to revert behavior, but PREEMPT_NONE is no longer universally available.
“Never Break Userspace” vs Performance Regressions
- Multiple commenters frame a 50% slowdown of a major database as effectively “breaking userspace,” pushing against the idea that this is “just” a performance change.
- Others counter that the kernel cannot freeze evolution every time some workload slows down and that perf regressions are not ABI breaks.
- There is criticism of introducing a new low-level mechanism in 7.0 and then expecting userspace to adopt it immediately to avoid a regression also introduced in 7.0.
Production Practices and Testing
- Many argue serious production systems will stay on older LTS kernels for years and won’t see this soon.
- Others push back, noting new deployments will land on Ubuntu 26.04 (with 7.0) quickly, and “if it ain’t broke, don’t fix it” can lead to under-tested upgrades later.
- Several emphasize the need for some users to run the latest kernels precisely to catch regressions like this.
Spinlocks, Alternatives, and Design Critique
- Some criticize userspace spinlocks altogether, suggesting futex-based mutexes or kernel-assisted mechanisms (e.g., rseq) for better behavior under preemption.
- PostgreSQL’s use of spinlocks is defended as historical and sometimes still performance-driven; replacing them isn’t trivial due to memory-barrier costs, data layout, and legacy platform support.
- It’s clarified that the regression would also exist with typical futex-based locks because non-PI futexes don’t transfer CPU time to lock holders; the key is the longer critical sections and preemption, not spinlocks per se.
- One PostgreSQL developer notes that the specific hot spinlock in the benchmark was already known to be “stupid” and unused in normal operation, and that the benchmark represents an “absurd” configuration.
ARM64, Ecosystem, and Broader Impact
- Commenters observe that ARM64 Linux often receives less real-world testing, leading to surprising regressions, though in this case the issue isn’t ARM-specific.
- There is worry that if PostgreSQL hits such a cliff, other less-scrutinized software may quietly degrade on 7.0, leading to subtle “enshittification.”
- Others argue we should wait for concrete cases beyond this contrived benchmark before demanding kernel reverts.
Media and Communication
- Some criticize the original article as sensational, arguing that the LKML thread itself quickly narrowed the issue to misconfiguration (no huge pages).
- Others insist that, even if mitigated by configuration tweaks, a major performance drop under realistic-enough settings is still worth flagging and shouldn’t just be dismissed.