2024-08-25

Linux Pipes Are Slow

Role and Usefulness of Pipes

Strong split between “pipes are archaic, avoid them” vs. “pipes are core to Unix composability.”
Many value pipes for shell scripting: replacing large scripts with one-liners using pipes/xargs; modular UNIX tools interconnected cheaply.
Others note pipes are not always appropriate, especially when latency, async I/O, or complex control are involved.

Performance, “Slowness,” and When It Matters

For most workloads, Linux pipes push tens of GB/s; many commenters say they’re not the bottleneck and are “fast enough,” like a Corolla vs. race car.
Some real-world cases (filesystems, storage frontends, high-throughput video pipelines) have hit pipe throughput or copying limits and moved to shared memory or other mechanisms.
Pipes are criticized for needless copying vs. zero-copy designs and for being slower than “long-distance” function calls or modern socket optimizations.

Nonblocking Semantics and Fragility

Confusion and correction around O_NONBLOCK: on Linux pipes, nonblocking is per file description; setting it on one end doesn’t alter semantics of the other.
Common bug: processes flipping nonblocking on shared FDs unexpectedly.
Using nonblocking pipes with stdio (e.g., printf) is generally unsafe because callers don’t handle EAGAIN / partial writes.

Kernel / Copy Details

Discussion of rep movsb vs SIMD: modern CPUs often accelerate rep movsb, but thresholds and best choices depend on CPU and data size.
Linux’s memcpy/memmove and thresholds are tuned and regularly updated; trade-off between peak speed and keeping code small and branch-light.
Some kernel disassembly details explained by retpoline/CONFIG_RETHUNK and SMAP (CLAC/STAC) patching.

Proposed Improvements and Alternatives

Proposal: kernel syscall exposing ringbuffers for file descriptors, including pipes, possibly mapped on both ends for zero-copy, poll/futex-friendly.
Concerns: more complex user-space semantics, potential for brittle behavior if not carefully designed, but others say it’s similar to shared memory + eventfd.
Suggestions to benchmark io_uring-based designs and domain sockets, especially for high-throughput video workflows.

Economics and Philosophy of Optimization

Debate over whether shaving a few percent off ubiquitous primitives is “worth it.”
One side: micro-optimizing pipes is premature for most users and adds complexity.
Other side: small, widespread gains compound globally (time, energy, emissions); optimizing core primitives is justified even if individual benefits are invisible.

Related topics