Replacing Protobuf with Rust
Headline & Rust Discourse
- Many see the title as misleading or “devbait”: the speedup comes from avoiding Protobuf-based serialization, not from Rust itself.
- Several argue the post rides on the “rewrite in Rust” meme for attention; others counter that such titles now mainly attract Rust skeptics and ragebait engagement.
- Some note the irony that Rust is part of the problem (needing a Protobuf-based bridge to C), and the work is actually about reducing Rust’s overhead in this setup.
What Actually Changed
- The old design: Rust code talked to a C library (Postgres query parser) via a Protobuf-based API, serializing the AST across a process/language boundary.
- The new design: a fork that replaces Protobuf with direct C↔Rust bindings and in-memory data sharing.
- Commenters stress: this is effectively “replacing Protobuf-as-FFI with real FFI,” not “Rust is 5x faster than Protobuf.”
Protobuf: Criticism & Defense
- Critics: using a wire-serialization format inside a single process is obviously wasteful; 5x speedup shows the original architecture was “built wrong.”
- Stronger critics call Protobuf “a joke” performance-wise and advocate zero-copy formats (FlatBuffers, Cap’n Proto, Arrow, custom layouts, etc.).
- Defenders: Protobuf is already very fast for what it is, and being only ~5× slower than raw memory copy is seen as impressive.
- Ergonomics and tooling, not raw speed, are cited as primary reasons to choose Protobuf:
- Cross-language codegen and type safety.
- Stable, evolvable contracts across teams and languages.
- Good fit for IoT and binary-heavy workloads compared to JSON/XML.
Why Protobuf Was Used Here
- The pg_query library originally used JSON, then moved to Protobuf to provide typed bindings for multiple languages (Ruby, Go, Rust, Python, etc.).
- Direct FFI would be fine for Rust alone but would require substantial, language-specific glue elsewhere; Protobuf kept that simpler.
- For non–performance-critical uses, Protobuf is expected to remain in that ecosystem.
FFI vs Serialization
- Some ask why Protobuf was “in the middle” at all when C ABIs are widely available.
- Others explain: writing safe, high-quality bindings over complex C data structures is tedious and error-prone; serializing to a well-defined, owned format (Protobuf) sidesteps tricky ownership and pointer semantics.
- The new Rust bindings effectively take on that complexity for better performance.
Performance & Appropriateness
- Multiple comments highlight the general lesson: big speedups often come from removing unnecessary serialization, not from switching languages.
- For typical “CRUD over strings/UUIDs” apps, several argue Protobuf (or even JSON) is usually fine and simpler; micro-optimizing ser/de is premature.
- In data- and compute-heavy domains (3D data, analytics, etc.), binary formats and zero-copy layouts can be crucial and justify the extra complexity.
Safety & Stability Concerns
- At least one commenter warns that shared-memory IPC/FFI is fragile and hard to keep stable; serialization exists partly to avoid these hazards.
- Others reply that in this case the Postgres “ABI” is relatively stable and the generated output is machine-verifiable, making the trade-off acceptable for this project.