2025-09-15

RustGPT: A pure-Rust transformer LLM built from scratch

Dependency Tree & Cargo Semver Behavior

Commenters inspect cargo tree and note the project has only three direct dependencies (ndarray, rand, rand_distr), seen as lean for a non-trivial project.
Discussion dives deep into Cargo’s version resolution:
- Dependency specifications like 0.9, 0.9.3 are treated as semver ranges with an implicit ^ operator.
- Cargo tries to unify to a single version per major (or “0.x minor”) version; multiple versions appear only when constraints are semver-incompatible (e.g., 0.8 and 0.7.1).
- Exact pinning with =0.9.3 is possible but discouraged for libraries because it fragments dependency graphs.

“From Scratch” & Use of Libraries

Some see the small, focused dependency set as a sign of quality.
Others argue that “from scratch” is overstated if core operations are delegated to existing libraries, but also note reusing libraries is sensible and reimplementation isn’t inherently better.

Code Readability, Style & Possible AI Generation

Many praise the code’s readability and straightforward structure, contrasting it with more complex, generic-heavy Rust.
Others criticize it as overly procedural and not idiomatic “modern Rust” (few iterators/enums).
Multiple commenters suspect README and portions of the code are LLM-generated (“vibe-coded”): telltale comments, emojis, file naming, and commit style.
Debate whether AI-generated Rust will “rot” code quality; some say it’s fine if humans clean up and focus effort on the hard parts, others say sloppy comments and duplicated patterns reveal shallow understanding.

Training Data, Behavior & Toy Nature

The model’s training data is tiny and embedded directly in main.rs (dozens of factual statements).
When prompted off-distribution, it quickly breaks down into nonsense outputs, reinforcing that this is a learning toy, not a usable LLM.
Suggestions include using public instruction and text datasets from Hugging Face and adding numerical gradient checks.

Rust vs Python: Tooling, Ecosystem & Performance

Several express relief at “just cargo run” compared to repeated stories of Python dependency hell.
A long subthread debates:
- Whether easy dependency inclusion (Cargo/npm style) is a feature or a trap that encourages dependency bloat and security risk.
- Centralized package registries vs more intentional, frictionful dependency models (Zig/Odin-style).
- Python packaging’s longstanding problems vs improvements with pyproject.toml and tools like uv (often described as “cargo for Python”).
- Some argue Python’s ecosystem is fundamentally flawed; others defend it as the de facto ML lingua franca whose C/C++ backends handle performance.

Rust in the ML Stack & Future Work

Commenters are excited to see a pure-Rust transformer and note Rust’s memory safety helps avoid subtle bugs (e.g., buffer overflows in transformers).
A few suggest GPU support, proper tokenization (e.g., BPE), and fixing architectural issues (e.g., reusing the same transformer block instance instead of separate layers).
Broader discussion touches on whether more of the AI ecosystem will or should migrate from Python to Rust/C++/other languages; consensus in the thread is mixed.

Related topics