2024-07-08

C++ patterns for low-latency applications including high-frequency trading

Practical low-latency patterns & ring buffers

Several comments discuss using the LMAX Disruptor pattern and SPSC ring buffers.
Key implementation pitfalls noted: need for std::atomic for producer/consumer indices, correct memory ordering (acquire/release), avoiding returning pointers after freeing slots, and preventing false sharing between indices.
Optimization tip: use power-of-two ring sizes and bitmasking instead of modulo; treat indices as ever-increasing sequence numbers.

C++ vs Rust and memory management

One practitioner built a C++ stock exchange using a disruptor-style queue and is rewriting in Rust, citing easier memory management and dependencies for solo projects.
Others warn that complex concurrent data structures are hard to get right in C++ and that Rust can be slower to iterate on designs due to the borrow checker, though fun to use.
Debate over std::shared_ptr: some claim it “won’t slow anything down,” others emphasize it does have atomic/refcount overhead and should be used sparingly, often hidden behind APIs. unique_ptr and explicit ownership modeling are strongly advocated.

HFT architectures, OS, and hardware tricks

Descriptions of production setups: colocation at exchanges, offloading TCP and network stack work to hardware, multicast logging, redundant hot-standby systems, and massive historical data replay for backtesting.
Techniques include core pinning, isolating cores and caches, minimizing context switches, sometimes running networking in user space, and careful PCIe/NIC layout.
FPGAs are widely used for ultra-low-latency tick-to-trade; for some flows they can act before a full packet is received. Software then handles slower, more complex logic.

Performance techniques & compiler behavior

Discussion of compile-time vs runtime dispatch: static dispatch enables inlining and further optimizations, but excessive inlining can hurt instruction cache; measurement is essential.
Mention of PGO, LTO, and branch prediction hints; some consider hints marginal or even counterproductive compared to PGO.
General low-latency mindset: avoid allocations and copies in hot paths, keep data in cache, be paranoid and profile (e.g., callgrind).

Relation to other domains (Java, games, audio)

Original LMAX was Java; with careful avoidance of allocation and GC, Java can be competitive, sometimes even disabling GC and rebooting daily.
Parallels drawn to game dev and real-time audio: similar focus on cache locality and predictable latency, but HFT pushes much shorter timescales (micro/nanoseconds vs milliseconds).

Value, risks, and regulation of HFT

Some view HFT as socially wasteful; others argue it narrows spreads, increases liquidity, and replaces a larger, less efficient human middleman industry.
Debate over whether many strategies truly “provide liquidity” vs remove it, and over tactics that induce responses in other bots.
Ideas floated: order-cancel taxes, time-quantized or randomized batch auctions; critics worry about unintended consequences and shadow/black markets.
Several emphasize that regulators already constrain cancellation rates and that empirical data (from tagged exchange feeds) shows declining net liquidity costs over time, suggesting markets are becoming more efficient.

Evaluation of the paper and learning resources

Some find the paper an “excellent intro,” especially given the scarcity of consolidated low-latency C++ material.
Others criticize it as a trivial recap of well-known micro-optimizations, unrepresentative examples (e.g., 65µs “inner loops”), and possibly LLM-written prose.
Alternative learning paths suggested: conference talks (esp. C++ and game dev), FPGA/HFT verification writeups, performance-oriented blogs, and a trading microstructure book (“Trades, Quotes and Prices”) for those interested in the market side.

Related topics