Automatically Translating C to Rust

Autotranslating C to Rust: Value and Limitations

  • Many report that C→Rust tools (e.g., c2rust) produce Rust that’s “compiler output”: heavy on unsafe, hard to read, and semantically still “C that crashes the same way.”
  • Others counter that such tools are still useful as a bootstrap: get a whole C codebase building as Rust, then gradually refactor toward safe, idiomatic Rust.
  • There are real-world successes (e.g., translating bzip2), but even those often retain 100+ unsafe uses and are far from fully safe.

Fil-C, GC, and Rust’s Niche

  • Fil-C is highlighted as making C “memory safe” via a smart compiler + GC, sometimes outperforming naïve .clone()-heavy Rust-style code.
  • However, performance overhead (up to ~4× in some cases) and lack of data-race prevention mean it doesn’t solve Rust’s problem set, especially around concurrency.
  • Suggested division of labor: Fil-C for running legacy/unported C; automatic C→Rust for starting a port; hypothetical “Fil-Rust” to sandbox unsafe Rust during migration.

Incremental Migration and FFI

  • Some argue auto-translating to unsafe Rust is pointless; you still need deep understanding and major redesign to reach safe, idiomatic Rust.
  • Others say incremental, function-by-function migration is possible using unsafe wrappers or less idiomatic abstractions (Cell, RefCell, etc.), with most benefits arriving near the end.
  • A competing view: what’s really needed is “painless FFI” and tools that let Rust call C using slices and safe types rather than rewriting everything.

Hard Technical Problems: Arrays, Aliasing, and Provenance

  • A key unsolved challenge is inferring array sizes and bounds globally so C pointers can be turned into Rust slices/Vecs; this is tied to ongoing work (e.g., DARPA TRACTOR).
  • Discussion dives into strict aliasing in C vs Rust’s model:
    • Rust lacks C-style strict aliasing but has validity rules (trap representations) and evolving notions of pointer provenance.
    • Type punning that is UB in C due to effective types may be allowed in Rust at the aliasing level but can still be UB via invalid value representations (e.g., punning into bool).

Rust Popularity, LLMs, and Future Rewrites

  • Some speculate about a future where Rust falls out of favor and people want Rust→C translators; others see Rust as having reached “critical mass” for secure, performant systems code.
  • Debate over LLMs:
    • Claims that “Rust is the winner of the LLM era” clash with reports that current models struggle with lifetimes and complex Rust, requiring significant human correction.
    • Separate thread: GitHub’s vulnerability graph allegedly spikes post-LLMs, suggesting a growing class of simple, non-memory-safety bugs.

Idiomatic vs Safe Rust

  • Several commenters distinguish “idiomatic” from merely “safe”:
    • It may be feasible to auto-generate non-idiomatic but safe Rust (correct lifetimes, Box/slices) for simpler C code.
    • Truly idiomatic Rust requires recognizing higher-level patterns (data structures, ownership models) that C couldn’t abstract; this is seen as closer to a creative or AGI-level task.

Rust Coreutils Size Concern

  • A side discussion notes a seemingly huge /usr/bin/ls after switching to Rust coreutils; clarified that:
    • Rust coreutils are shipped as one ~12–13 MB binary hardlinked under many names.
    • Overall size increase over GNU coreutils is modest and dwarfed by the rest of /usr/bin.