2024-05-19

Llama3 implemented from scratch

Project & Purpose

Repository reimplements Llama 3 inference “from scratch” with detailed, step-by-step explanation.
Several commenters see it as an educational tool, not novel research.
Some compare it to previous “from scratch” projects (e.g., Llama 2, GPT-2 in minimal code, llama2.c) and say the architecture is nearly identical, so the main value is teaching, not innovation.
A few criticize style (anime, all-lowercase) or readability, others dismiss this as nitpicking.

Inference vs Training & Implementation Complexity

This project appears focused on inference, not training; some wish for an equally clear, open-sourced training walkthrough.
Multiple comments emphasize that core LLM code is conceptually simple; the real difficulty is:
- Distributed training at scale and GPU utilization.
- Access to hardware, high-quality data, and preprocessing.
- RLHF and large human-annotation pipelines.
Individuals report implementing inference for sizable models in weeks using reference code to validate tensors.

Transformers, Architectures & Alternatives

Discussion revisits why transformers dominate: standardized blocks, easy parallelization, GPU efficiency.
Some criticize overuse of transformers in non-language domains; others respond that they now work well across text, images, audio, and robotics.
SSMs (e.g., Mamba) are debated:
- One side: linear/logarithmic-time attention is more than a small optimization and could be a big deal.
- Other side: still mostly an efficiency tweak; transformers remain functionally general and entrenched.
Ideas around KV-cache pruning and selective attention are raised; others note related existing research and unclear practical gains.

Industry Moats & Alignment

Several argue the real moat is:
- Being a few months ahead in model quality.
- Deep integration into products.
- Huge curated fine-tuning and RLHF pipelines.
Intense subthread on “alignment” and censorship:
- Critics say safety layers produce bland, moralizing “slop” and block creative/edgy uses.
- Others counter that some safety is necessary, biases are unavoidable, and uncensored base or open models remain available.

Learning Paths & Conceptual Resources

For newcomers, many recommend:
- Intro deep learning courses and books.
- Visual/interactive explanations of transformers and toy models (including spreadsheet and web demos).
Consensus: this repo is not the best starting point but a good later-stage, hands-on reference once basics are understood.

Related topics