2024-07-30

Translating All C to Rust (TRACTOR)

Program goals and DARPA context

TRACTOR aims for high‑automation translation of existing C to Rust, ideally producing code like a skilled Rust developer and eliminating memory‑safety bugs.
Commenters stress DARPA routinely funds “DARPA‑hard” problems: success is not guaranteed; partial wins and research outputs are expected.
Prior related efforts exist (c2rust, ROSE, CRAM, Managed Sulong), some also DARPA/DOE funded.

Feasibility of C→Rust translation

Many think full, automatic translation to safe idiomatic Rust is extremely hard:
- Need to reconstruct lifetimes, ownership, aliasing, array sizes, concurrency discipline, and intended invariants that C doesn’t encode.
- C often contains real memory bugs; there is no unique “correct” safe Rust equivalent.
- Non‑affine pointer use, back‑references, unions, macros, and threading make borrow‑checker‑friendly structures nontrivial.
Likely outputs:
- Mechanical translation to mostly unsafe Rust (as c2rust does) is seen as doable but of limited value: bug‑compatible, unreadable, hard to refactor.
- Some suggest interactive tools: translate as far as possible, flag problem areas, and guide humans to redesign those parts.

Rust, undefined behavior, and Miri

Strong claim in thread: by design, safe Rust (excluding compiler bugs and unsafe code beneath abstractions) cannot exhibit UB; UB always originates in unsafe.
Counterpoints:
- UB can “leak” into safe code via unsound unsafe libraries or std; there have been such bugs.
- Compiler and LLVM bugs exist; these are treated as implementation bugs, not language‑level UB.
Miri (a MIR interpreter) is discussed:
- Used to detect UB in unsafe Rust; useful but not sufficient to guarantee safety.
- Some argue referencing Miri doesn’t refute Rust’s safety model; others use it to highlight real UB cases and temper absolutist safety rhetoric.

Alternatives to “rewrite it in Rust”

Several argue improving C/C++ with:
- Bounded model checking (CBMC, Kani, Frama‑C), sanitizers, strict coding standards (MISRA, JPL), and safer libraries.
- Claim: you can get very strong guarantees for existing C if you invest in these tools.
Others respond that:
- Industry has had decades to adopt such practices; memory‑safety CVEs still dominate.
- Safer defaults (Rust, Ada/SPARK, managed languages) reduce reliance on exceptional discipline.

AI’s potential role

Some see LLMs + verification as promising: mechanical C→Rust pass, then AI‑guided refactoring, checked by compilers, tests, fuzzers, sanitizers, Miri.
Skeptics note LLMs are weak at long‑range correctness; automatic translation may still require heavy human validation.

Maintainability and organizational issues

Concern that machine‑translated “shitty Rust” will be harder to maintain than the original C and drive developers away.
For long‑lived systems, teams must actually understand and be able to evolve the translated Rust; otherwise C remains the de facto source.

Related topics