It's Always TCP_NODELAY
Practical Experiences & Performance Wins
- Multiple commenters report big latency improvements after disabling Nagle via
TCP_NODELAYin:- Chatty protocols (e.g., DICOM on LAN, database client libraries, student TCP simulators, SSH-based games).
- Cases where messages are ready in user space but sit unsent due to kernel buffering.
- Go is noted as disabling Nagle by default, which surprised some who were debugging latency.
- Some mention using LD_PRELOAD hacks or libraries (e.g.,
libnodelay) to forceTCP_NODELAYfor legacy binaries.
Nagle vs Delayed ACK & TCP_QUICKACK
- A recurring theme is that Nagle’s algorithm and delayed ACKs interact badly:
- Nagle waits for ACK to send small packets; delayed ACK waits to piggyback an ACK, causing 100–200ms stalls or worse.
- Historical context: early TCP stacks used long global ACK timers (~500 ms).
TCP_QUICKACKcan reduce receive-side ACK delay but doesn’t fix send-side buffering. Portability across OSes is uneven.- One suggestion: TCP stacks should track whether delayed ACKs actually get piggybacked and disable them per-socket when they don’t.
Should Nagle Still Exist / Be Default?
- One camp: Nagle is “outmoded,” should be off by default, and policy should live in applications, which can buffer themselves.
- Another camp: it still protects shared/cellular/wifi links from floods of tiny packets and helps poorly written or unmaintained software.
- Some argue the kernel must arbitrate tradeoffs between competing apps; others say this is the app’s responsibility.
- Side effect: disabling Nagle can increase fingerprinting risk by exposing fine-grained timing (e.g., keystroke patterns).
APIs, “Flush,” and Message Orientation
- Many lament that the stream-based socket API lacks a proper “flush now” for TCP, making mixed interactive/bulk use awkward.
TCP_CORK,MSG_MORE, and buffered writers are cited as partial workarounds, but portability is limited.- Several argue TCP APIs should have been message-oriented from the start; instead, every protocol reimplements framing on top of a byte stream.
- SCTP and QUIC are mentioned as more message-like alternatives, but lack broad OS-level, general-purpose adoption.
Alternatives & Generic Batching
- Suggestions to use UDP (or QUIC, Aeron, ENet, MoldUDP-style protocols) when you control both ends and can implement reliability/ordering as needed.
- One commenter reframes Nagle and delayed ACK as poor special cases of a more general “work-or-time” batching strategy with explicit latency bounds.
- Related lower-level analogy: interrupt moderation on NICs—also a batching vs latency tradeoff.
Ethernet, CSMA, and Legacy Networks (Side Thread)
- Long subdiscussion on CSMA/CD vs CSMA/CA, hubs vs switches, full duplex, PAUSE frames, and why collisions effectively don’t exist on modern switched, full‑duplex Ethernet.
- Some corrections that Nagle is a TCP-layer mechanism and not directly about CSMA, though both historically addressed inefficient use of shared media.