Building LLMs from the Ground Up: A 3-Hour Coding Workshop
Overall reception & learning value
- Many commenters praise the workshop as clear, practical, and a good way to revisit fundamentals of transformers and LLMs.
- Several say it hits a “just right” level for people already comfortable with deep learning/PyTorch and who don’t want ultra-low-level autograd-from-scratch material.
- Others share additional resources (e.g., other “GPT from scratch” writeups/videos) that complement this, each emphasizing different aspects (training vs inference, numpy-level math vs framework use).
Data cleaning, instruction following, and real-world models
- Some ask for more detail on how major models clean and structure training data, suggesting this is where long-term differentiation will lie.
- Commenters point to sections in large model papers (e.g., “steerability” / instruction tuning) as partial answers.
- One thread stresses that unstructured pretraining alone yields a babbling model; instruction-following behavior requires additional structured training with human feedback.
“From scratch” and abstraction level debate
- Significant discussion centers on whether building an LLM “from the ground up” should use PyTorch or go lower-level (numpy, custom autograd, or even C/assembly).
- One camp: PyTorch
nnis “low level enough” for understanding transformers; going deeper is mostly for framework/hardware developers. - Another camp: “from scratch” should avoid major dependencies and expose more of the mechanics; they cite bottom-up tutorials (e.g., autograd by hand) as more educational.
- Some propose a pedagogical progression: basic programming → text processing → n‑grams/Markov chains → then transformers.
Should people build their own LLMs?
- Skeptical voices argue most individuals can’t train competitive models and should focus on building applications on top of existing LLMs.
- Others counter that educational value, intuition-building, and niche/small models on modest hardware still justify learning to build and train models.
Alternative/simple language models and terminology
- A long subthread debates a non-LLM “transformer” project based on n‑grams/Markov chains plus rules.
- Critics say calling it a “transformer” is misleading in today’s NLP context, where that term refers to a specific architecture.
- The author defends the broader mathematical meaning of “transform/transformer” and argues that n‑grams, POS tagging, and embeddings are intertwined in modern systems.
- Multiple commenters push back that terminology in ML has become specialized and reuse of core terms can confuse users.
Platform and tooling notes
- Some Windows users wonder about compatibility; others recommend WSL2 with CUDA as a practical route.
- A separate guide is mentioned for training nanoGPT on cloud GPUs for relatively low cost, though its practical utility is described as mostly educational.
Language around “coding”
- A minor tangent discusses dislike for the term “coding” versus “programming” or “software engineering.”
- Views differ by culture and personal taste; some see “coder” as less professional, others embrace it as long-standing slang.