It's Always TCP_NODELAY

Practical Experiences & Performance Wins

  • Multiple commenters report big latency improvements after disabling Nagle via TCP_NODELAY in:
    • Chatty protocols (e.g., DICOM on LAN, database client libraries, student TCP simulators, SSH-based games).
    • Cases where messages are ready in user space but sit unsent due to kernel buffering.
  • Go is noted as disabling Nagle by default, which surprised some who were debugging latency.
  • Some mention using LD_PRELOAD hacks or libraries (e.g., libnodelay) to force TCP_NODELAY for legacy binaries.

Nagle vs Delayed ACK & TCP_QUICKACK

  • A recurring theme is that Nagle’s algorithm and delayed ACKs interact badly:
    • Nagle waits for ACK to send small packets; delayed ACK waits to piggyback an ACK, causing 100–200ms stalls or worse.
  • Historical context: early TCP stacks used long global ACK timers (~500 ms).
  • TCP_QUICKACK can reduce receive-side ACK delay but doesn’t fix send-side buffering. Portability across OSes is uneven.
  • One suggestion: TCP stacks should track whether delayed ACKs actually get piggybacked and disable them per-socket when they don’t.

Should Nagle Still Exist / Be Default?

  • One camp: Nagle is “outmoded,” should be off by default, and policy should live in applications, which can buffer themselves.
  • Another camp: it still protects shared/cellular/wifi links from floods of tiny packets and helps poorly written or unmaintained software.
  • Some argue the kernel must arbitrate tradeoffs between competing apps; others say this is the app’s responsibility.
  • Side effect: disabling Nagle can increase fingerprinting risk by exposing fine-grained timing (e.g., keystroke patterns).

APIs, “Flush,” and Message Orientation

  • Many lament that the stream-based socket API lacks a proper “flush now” for TCP, making mixed interactive/bulk use awkward.
  • TCP_CORK, MSG_MORE, and buffered writers are cited as partial workarounds, but portability is limited.
  • Several argue TCP APIs should have been message-oriented from the start; instead, every protocol reimplements framing on top of a byte stream.
  • SCTP and QUIC are mentioned as more message-like alternatives, but lack broad OS-level, general-purpose adoption.

Alternatives & Generic Batching

  • Suggestions to use UDP (or QUIC, Aeron, ENet, MoldUDP-style protocols) when you control both ends and can implement reliability/ordering as needed.
  • One commenter reframes Nagle and delayed ACK as poor special cases of a more general “work-or-time” batching strategy with explicit latency bounds.
  • Related lower-level analogy: interrupt moderation on NICs—also a batching vs latency tradeoff.

Ethernet, CSMA, and Legacy Networks (Side Thread)

  • Long subdiscussion on CSMA/CD vs CSMA/CA, hubs vs switches, full duplex, PAUSE frames, and why collisions effectively don’t exist on modern switched, full‑duplex Ethernet.
  • Some corrections that Nagle is a TCP-layer mechanism and not directly about CSMA, though both historically addressed inefficient use of shared media.