Replacing Protobuf with Rust

Headline & Rust Discourse

  • Many see the title as misleading or “devbait”: the speedup comes from avoiding Protobuf-based serialization, not from Rust itself.
  • Several argue the post rides on the “rewrite in Rust” meme for attention; others counter that such titles now mainly attract Rust skeptics and ragebait engagement.
  • Some note the irony that Rust is part of the problem (needing a Protobuf-based bridge to C), and the work is actually about reducing Rust’s overhead in this setup.

What Actually Changed

  • The old design: Rust code talked to a C library (Postgres query parser) via a Protobuf-based API, serializing the AST across a process/language boundary.
  • The new design: a fork that replaces Protobuf with direct C↔Rust bindings and in-memory data sharing.
  • Commenters stress: this is effectively “replacing Protobuf-as-FFI with real FFI,” not “Rust is 5x faster than Protobuf.”

Protobuf: Criticism & Defense

  • Critics: using a wire-serialization format inside a single process is obviously wasteful; 5x speedup shows the original architecture was “built wrong.”
  • Stronger critics call Protobuf “a joke” performance-wise and advocate zero-copy formats (FlatBuffers, Cap’n Proto, Arrow, custom layouts, etc.).
  • Defenders: Protobuf is already very fast for what it is, and being only ~5× slower than raw memory copy is seen as impressive.
  • Ergonomics and tooling, not raw speed, are cited as primary reasons to choose Protobuf:
    • Cross-language codegen and type safety.
    • Stable, evolvable contracts across teams and languages.
    • Good fit for IoT and binary-heavy workloads compared to JSON/XML.

Why Protobuf Was Used Here

  • The pg_query library originally used JSON, then moved to Protobuf to provide typed bindings for multiple languages (Ruby, Go, Rust, Python, etc.).
  • Direct FFI would be fine for Rust alone but would require substantial, language-specific glue elsewhere; Protobuf kept that simpler.
  • For non–performance-critical uses, Protobuf is expected to remain in that ecosystem.

FFI vs Serialization

  • Some ask why Protobuf was “in the middle” at all when C ABIs are widely available.
  • Others explain: writing safe, high-quality bindings over complex C data structures is tedious and error-prone; serializing to a well-defined, owned format (Protobuf) sidesteps tricky ownership and pointer semantics.
  • The new Rust bindings effectively take on that complexity for better performance.

Performance & Appropriateness

  • Multiple comments highlight the general lesson: big speedups often come from removing unnecessary serialization, not from switching languages.
  • For typical “CRUD over strings/UUIDs” apps, several argue Protobuf (or even JSON) is usually fine and simpler; micro-optimizing ser/de is premature.
  • In data- and compute-heavy domains (3D data, analytics, etc.), binary formats and zero-copy layouts can be crucial and justify the extra complexity.

Safety & Stability Concerns

  • At least one commenter warns that shared-memory IPC/FFI is fragile and hard to keep stable; serialization exists partly to avoid these hazards.
  • Others reply that in this case the Postgres “ABI” is relatively stable and the generated output is machine-verifiable, making the trade-off acceptable for this project.