Building LLMs from the Ground Up: A 3-Hour Coding Workshop

Overall reception & learning value

  • Many commenters praise the workshop as clear, practical, and a good way to revisit fundamentals of transformers and LLMs.
  • Several say it hits a “just right” level for people already comfortable with deep learning/PyTorch and who don’t want ultra-low-level autograd-from-scratch material.
  • Others share additional resources (e.g., other “GPT from scratch” writeups/videos) that complement this, each emphasizing different aspects (training vs inference, numpy-level math vs framework use).

Data cleaning, instruction following, and real-world models

  • Some ask for more detail on how major models clean and structure training data, suggesting this is where long-term differentiation will lie.
  • Commenters point to sections in large model papers (e.g., “steerability” / instruction tuning) as partial answers.
  • One thread stresses that unstructured pretraining alone yields a babbling model; instruction-following behavior requires additional structured training with human feedback.

“From scratch” and abstraction level debate

  • Significant discussion centers on whether building an LLM “from the ground up” should use PyTorch or go lower-level (numpy, custom autograd, or even C/assembly).
  • One camp: PyTorch nn is “low level enough” for understanding transformers; going deeper is mostly for framework/hardware developers.
  • Another camp: “from scratch” should avoid major dependencies and expose more of the mechanics; they cite bottom-up tutorials (e.g., autograd by hand) as more educational.
  • Some propose a pedagogical progression: basic programming → text processing → n‑grams/Markov chains → then transformers.

Should people build their own LLMs?

  • Skeptical voices argue most individuals can’t train competitive models and should focus on building applications on top of existing LLMs.
  • Others counter that educational value, intuition-building, and niche/small models on modest hardware still justify learning to build and train models.

Alternative/simple language models and terminology

  • A long subthread debates a non-LLM “transformer” project based on n‑grams/Markov chains plus rules.
  • Critics say calling it a “transformer” is misleading in today’s NLP context, where that term refers to a specific architecture.
  • The author defends the broader mathematical meaning of “transform/transformer” and argues that n‑grams, POS tagging, and embeddings are intertwined in modern systems.
  • Multiple commenters push back that terminology in ML has become specialized and reuse of core terms can confuse users.

Platform and tooling notes

  • Some Windows users wonder about compatibility; others recommend WSL2 with CUDA as a practical route.
  • A separate guide is mentioned for training nanoGPT on cloud GPUs for relatively low cost, though its practical utility is described as mostly educational.

Language around “coding”

  • A minor tangent discusses dislike for the term “coding” versus “programming” or “software engineering.”
  • Views differ by culture and personal taste; some see “coder” as less professional, others embrace it as long-standing slang.