It's time to replace TCP in the datacenter (2023)

Status of TCP in Datacenters

  • Many commenters assert TCP is still dominant and “good enough” for most DC workloads; bottlenecks are often bandwidth or hardware, not the protocol.
  • Some claim large cloud/FAANG-like operators already use non‑TCP transports internally (QUIC, SRD, RDMA, custom stacks), but others are skeptical these replace TCP “entirely” rather than in niches.
  • There’s consensus that any replacement is likely to remain a niche for ultra‑low‑latency or AI/HPC workloads.

Homa’s Goals and Concerns

  • Homa is seen as an RPC‑optimized, receiver‑driven, low‑latency transport aiming to eliminate core congestion and head‑of‑line blocking.
  • Skeptics question how it behaves under “systemic overload” and whether receiver‑driven grants could enable denial‑of‑service if receivers misbehave.
  • Some note TCP already allows receiver‑side control (window size) and doubt Homa’s gains justify a new L4 protocol.
  • Others argue Homa can significantly beat DCTCP and similar TCP variants on latency, but this is considered a specialized benefit.

Alternatives and Related Technologies

  • QUIC/HTTP‑3 repeatedly mentioned: user‑space, UDP‑based, multiplexed, with better latency for web/RPC, but solving different problems than Homa.
  • RDMA (RoCE, InfiniBand) widely used for low‑latency and AI training; however, it has scaling, cost, and vendor‑lock concerns.
  • Fibre Channel praised for rock‑solid, receiver‑controlled storage transport but criticized as very expensive, niche, and lagging in speeds.
  • Ultra Ethernet Consortium work cited as evidence the industry is designing new Ethernet/IP‑friendly transports for AI networks.

Security and Operational Complexity

  • Several note the paper mostly ignores encryption, inspection, multi‑tenant isolation, and DoS resistance, all “table stakes” in real DCs.
  • Mixing TCP at the DC edge with Homa inside is seen as troubleshooting hell, especially across L4/L7 load balancers and SDN overlays.

Adoption Barriers and Standardization

  • Strong theme: protocol “ossification” and inertia. TCP persists because:
    • Universally supported in hardware and software.
    • Skills, tooling, and battle‑tested behaviors exist.
    • New protocols must be vastly better to overcome migration pain.
  • Comparisons to IPv6 adoption: even clear technical wins struggle without huge, immediate value.

Side Discussions

  • WebSockets vs raw TCP: some like WebSockets’ message framing; others find it inefficient and unnecessary outside web contexts.
  • Game dev and some DC/HPC folks avoid TCP in favor of UDP/RDMA, but this is acknowledged as a minority, high‑specialization domain.