It's time to replace TCP in the datacenter (2023)
Status of TCP in Datacenters
- Many commenters assert TCP is still dominant and “good enough” for most DC workloads; bottlenecks are often bandwidth or hardware, not the protocol.
- Some claim large cloud/FAANG-like operators already use non‑TCP transports internally (QUIC, SRD, RDMA, custom stacks), but others are skeptical these replace TCP “entirely” rather than in niches.
- There’s consensus that any replacement is likely to remain a niche for ultra‑low‑latency or AI/HPC workloads.
Homa’s Goals and Concerns
- Homa is seen as an RPC‑optimized, receiver‑driven, low‑latency transport aiming to eliminate core congestion and head‑of‑line blocking.
- Skeptics question how it behaves under “systemic overload” and whether receiver‑driven grants could enable denial‑of‑service if receivers misbehave.
- Some note TCP already allows receiver‑side control (window size) and doubt Homa’s gains justify a new L4 protocol.
- Others argue Homa can significantly beat DCTCP and similar TCP variants on latency, but this is considered a specialized benefit.
Alternatives and Related Technologies
- QUIC/HTTP‑3 repeatedly mentioned: user‑space, UDP‑based, multiplexed, with better latency for web/RPC, but solving different problems than Homa.
- RDMA (RoCE, InfiniBand) widely used for low‑latency and AI training; however, it has scaling, cost, and vendor‑lock concerns.
- Fibre Channel praised for rock‑solid, receiver‑controlled storage transport but criticized as very expensive, niche, and lagging in speeds.
- Ultra Ethernet Consortium work cited as evidence the industry is designing new Ethernet/IP‑friendly transports for AI networks.
Security and Operational Complexity
- Several note the paper mostly ignores encryption, inspection, multi‑tenant isolation, and DoS resistance, all “table stakes” in real DCs.
- Mixing TCP at the DC edge with Homa inside is seen as troubleshooting hell, especially across L4/L7 load balancers and SDN overlays.
Adoption Barriers and Standardization
- Strong theme: protocol “ossification” and inertia. TCP persists because:
- Universally supported in hardware and software.
- Skills, tooling, and battle‑tested behaviors exist.
- New protocols must be vastly better to overcome migration pain.
- Comparisons to IPv6 adoption: even clear technical wins struggle without huge, immediate value.
Side Discussions
- WebSockets vs raw TCP: some like WebSockets’ message framing; others find it inefficient and unnecessary outside web contexts.
- Game dev and some DC/HPC folks avoid TCP in favor of UDP/RDMA, but this is acknowledged as a minority, high‑specialization domain.