Pnut: A C to POSIX shell compiler you can trust
Project goals and motivation
- Intended as a C→POSIX shell compiler that produces human-readable scripts.
- Main stated goal: help with “trusting trust” concerns and bootstrappable build chains (e.g., bootstrap Pnut itself, then a native backend, then TCC, then GCC) using only a POSIX shell and source.
- Some commenters see it as clever but heading opposite their preference (would rather compile C to portable binaries, e.g., with other toolchains).
- Others value it as an exploration of Unix “shell as glue” and as a conceptual demo of what POSIX sh can do.
Implementation approach and language subset
- Uses only POSIX shell builtins (primarily
readandprintf), no external utilities, to maximize portability. - Memory is modeled via many numbered variables (
_0,_1, …) and arithmetic expansion, since POSIX sh lacks arrays. - All compiler-generated variables hold numbers only, so code often omits quoting; this conflicts with common shell best practices and tools like ShellCheck.
- C support is a restricted subset: missing or limited handling of unsigned types, static variables, arrays,
glob.h, many libc and POSIX APIs (e.g.,openmodes,socket,lseek,mmap,pthread,setjmp,dlopen). - Some constructs “compile” to calls like
_globor_socketthat are not implemented. - Pointers are mapped onto the same underlying representation as integers; parameter types like
intvsint*are not distinguished. - Wrapping arithmetic and precise C undefined behavior are not modeled.
I/O and binary data
- Examples include base64 and SHA-256 implemented within the constraints.
- Input is read via
read -r, which cannot handle NUL bytes; authors acknowledge base64 example doesn’t support full binary input. - Output of arbitrary bytes is possible using
printf, enabling an x86 backend, but robust binary I/O in shell remains a limitation.
Performance and shell differences
- Heavy use of many variables can be slow in some shells (e.g., dash does linear lookup over many variables), but authors report acceptable times for bootstrapping Pnut itself.
- Benchmarks shared: for compiling
pnut.cwithpnut.sh, ksh is fastest, dash somewhat slower, bash slower still, and zsh much slower. - Subshells are noted as a major bottleneck; runtime library tries to avoid them.
Usefulness vs. practicality
- Enthusiasts like that it stretches what POSIX sh can do and fits into bootstrapping/Stage0/bootstrappable-builds efforts.
- Skeptics question why anyone would want to write C for shell-like tasks, or produce slower, less capable shell code instead of portable binaries.
- Several argue most nontrivial shell scripts should instead be written in higher-level languages (Python, Rust, etc.) for maintainability and debuggability.
- Debates spill into build systems: whether having a dedicated DSL (make, CMake, Meson) is better than using C itself; opinions are strongly split.
Trust, security, and messaging
- “You can trust” tagline is widely criticized as marketing; readers say being told to trust something makes them more suspicious.
- Others connect the “trust” framing explicitly to Ken Thompson’s “Reflections on Trusting Trust” and double-diverse compiling, arguing that human-auditable shell output from multiple independent shells can improve confidence.
- Some note that trust still ultimately rests on the shell implementation and environment.
Critiques and open issues
- Generated scripts trigger many ShellCheck warnings and errors; some are acknowledged as analyzer limitations, others less clearly so.
- Error handling is often missing or oversimplified: examples like
cp/cathave writes that don’t check for errors or partial writes. - Some operations emit explicit runtime “unknown mode” errors instead of implementing full behavior.
- There are reports of code with undeclared identifiers compiling without diagnostics, suggesting poor or absent semantic checking.
- Several commenters argue the project should more clearly document its C subset and limitations; others see the current state as an impressive but incomplete prototype.