2026-06-23

F3

What F3 Is (as inferred from the thread)

Columnar data storage format intended as an alternative to Parquet/ORC/Nimble/Lance, not a general file format.
Designed for analytics / “big data” workloads, with focus on random access and extensibility.
Embeds WebAssembly (Wasm) decoders in each file as a self-describing, forward-compatible mechanism.
Decoders appear to output Arrow-style buffers; format metadata itself is defined via FlatBuffers.

Critique of Documentation and “Why”

Many readers find the GitHub README vague and marketing-heavy: unclear what the format does, what problems it solves, or where it should be used.
The core rationale is mostly in the linked research paper; the repo alone is considered hard to understand.
Requests that advantages over Parquet (with metrics) be summarized directly on the README.

Motivation vs Parquet and Other Formats

Cited shortcomings of Parquet include: hardware-oblivious design, global/awkward metadata, difficulty adding new encodings while maintaining compatibility, and weak random access.
Some argue these could be addressed by investing more engineering into Parquet or alternative formats like Vortex or Lance.
Others see value in new formats for mixed batch + random access and ML workloads, though Parquet’s broad compatibility remains a major moat.

Embedded Wasm Decoders: Pros and Cons

Proponents:
- Solves forward-compatibility for new encodings without updating every reader.
- Platform-independent, sandboxed VM; decoders can be pure functions returning buffers.
- Similar ideas have existed (RAR VM, fonts, Anyblox); Wasm runtimes can limit memory and instruction counts.
Skeptics:
- Embedding executable code in data files increases attack surface (RCE, DoS, compression bombs).
- Even with sandboxing, bugs in Wasm engines or host interfaces are likely.
- Makes ingestion of untrusted data risky unless Wasm is disabled, which undercuts a key selling point.
- Debugging third-party Wasm decoders can be painful.

Performance, Adoption, and Longevity

Concerns that Wasm-based decoding may be slower and interfere with engine-level optimizations (e.g., DuckDB-style vectorization).
Question whether a research project with few recent commits and no ecosystem support can displace Parquet.
Some see F3 as potentially better for archival, but others argue that simple, text-like formats (CSV/JSON) or Parquet itself are more future-proof.
Overall sentiment: interesting, clever idea with serious practical, security, and adoption hurdles.

Related topics