RustGPT: A pure-Rust transformer LLM built from scratch

Dependency Tree & Cargo Semver Behavior

  • Commenters inspect cargo tree and note the project has only three direct dependencies (ndarray, rand, rand_distr), seen as lean for a non-trivial project.
  • Discussion dives deep into Cargo’s version resolution:
    • Dependency specifications like 0.9, 0.9.3 are treated as semver ranges with an implicit ^ operator.
    • Cargo tries to unify to a single version per major (or “0.x minor”) version; multiple versions appear only when constraints are semver-incompatible (e.g., 0.8 and 0.7.1).
    • Exact pinning with =0.9.3 is possible but discouraged for libraries because it fragments dependency graphs.

“From Scratch” & Use of Libraries

  • Some see the small, focused dependency set as a sign of quality.
  • Others argue that “from scratch” is overstated if core operations are delegated to existing libraries, but also note reusing libraries is sensible and reimplementation isn’t inherently better.

Code Readability, Style & Possible AI Generation

  • Many praise the code’s readability and straightforward structure, contrasting it with more complex, generic-heavy Rust.
  • Others criticize it as overly procedural and not idiomatic “modern Rust” (few iterators/enums).
  • Multiple commenters suspect README and portions of the code are LLM-generated (“vibe-coded”): telltale comments, emojis, file naming, and commit style.
  • Debate whether AI-generated Rust will “rot” code quality; some say it’s fine if humans clean up and focus effort on the hard parts, others say sloppy comments and duplicated patterns reveal shallow understanding.

Training Data, Behavior & Toy Nature

  • The model’s training data is tiny and embedded directly in main.rs (dozens of factual statements).
  • When prompted off-distribution, it quickly breaks down into nonsense outputs, reinforcing that this is a learning toy, not a usable LLM.
  • Suggestions include using public instruction and text datasets from Hugging Face and adding numerical gradient checks.

Rust vs Python: Tooling, Ecosystem & Performance

  • Several express relief at “just cargo run” compared to repeated stories of Python dependency hell.
  • A long subthread debates:
    • Whether easy dependency inclusion (Cargo/npm style) is a feature or a trap that encourages dependency bloat and security risk.
    • Centralized package registries vs more intentional, frictionful dependency models (Zig/Odin-style).
    • Python packaging’s longstanding problems vs improvements with pyproject.toml and tools like uv (often described as “cargo for Python”).
    • Some argue Python’s ecosystem is fundamentally flawed; others defend it as the de facto ML lingua franca whose C/C++ backends handle performance.

Rust in the ML Stack & Future Work

  • Commenters are excited to see a pure-Rust transformer and note Rust’s memory safety helps avoid subtle bugs (e.g., buffer overflows in transformers).
  • A few suggest GPU support, proper tokenization (e.g., BPE), and fixing architectural issues (e.g., reusing the same transformer block instance instead of separate layers).
  • Broader discussion touches on whether more of the AI ecosystem will or should migrate from Python to Rust/C++/other languages; consensus in the thread is mixed.