RustGPT: A pure-Rust transformer LLM built from scratch
Dependency Tree & Cargo Semver Behavior
- Commenters inspect
cargo treeand note the project has only three direct dependencies (ndarray,rand,rand_distr), seen as lean for a non-trivial project. - Discussion dives deep into Cargo’s version resolution:
- Dependency specifications like
0.9,0.9.3are treated as semver ranges with an implicit^operator. - Cargo tries to unify to a single version per major (or “0.x minor”) version; multiple versions appear only when constraints are semver-incompatible (e.g.,
0.8and0.7.1). - Exact pinning with
=0.9.3is possible but discouraged for libraries because it fragments dependency graphs.
- Dependency specifications like
“From Scratch” & Use of Libraries
- Some see the small, focused dependency set as a sign of quality.
- Others argue that “from scratch” is overstated if core operations are delegated to existing libraries, but also note reusing libraries is sensible and reimplementation isn’t inherently better.
Code Readability, Style & Possible AI Generation
- Many praise the code’s readability and straightforward structure, contrasting it with more complex, generic-heavy Rust.
- Others criticize it as overly procedural and not idiomatic “modern Rust” (few iterators/enums).
- Multiple commenters suspect README and portions of the code are LLM-generated (“vibe-coded”): telltale comments, emojis, file naming, and commit style.
- Debate whether AI-generated Rust will “rot” code quality; some say it’s fine if humans clean up and focus effort on the hard parts, others say sloppy comments and duplicated patterns reveal shallow understanding.
Training Data, Behavior & Toy Nature
- The model’s training data is tiny and embedded directly in
main.rs(dozens of factual statements). - When prompted off-distribution, it quickly breaks down into nonsense outputs, reinforcing that this is a learning toy, not a usable LLM.
- Suggestions include using public instruction and text datasets from Hugging Face and adding numerical gradient checks.
Rust vs Python: Tooling, Ecosystem & Performance
- Several express relief at “just
cargo run” compared to repeated stories of Python dependency hell. - A long subthread debates:
- Whether easy dependency inclusion (Cargo/npm style) is a feature or a trap that encourages dependency bloat and security risk.
- Centralized package registries vs more intentional, frictionful dependency models (Zig/Odin-style).
- Python packaging’s longstanding problems vs improvements with
pyproject.tomland tools likeuv(often described as “cargo for Python”). - Some argue Python’s ecosystem is fundamentally flawed; others defend it as the de facto ML lingua franca whose C/C++ backends handle performance.
Rust in the ML Stack & Future Work
- Commenters are excited to see a pure-Rust transformer and note Rust’s memory safety helps avoid subtle bugs (e.g., buffer overflows in transformers).
- A few suggest GPU support, proper tokenization (e.g., BPE), and fixing architectural issues (e.g., reusing the same transformer block instance instead of separate layers).
- Broader discussion touches on whether more of the AI ecosystem will or should migrate from Python to Rust/C++/other languages; consensus in the thread is mixed.