2024-08-31

Building LLMs from the Ground Up: A 3-Hour Coding Workshop

Overall reception & learning value

Many commenters praise the workshop as clear, practical, and a good way to revisit fundamentals of transformers and LLMs.
Several say it hits a “just right” level for people already comfortable with deep learning/PyTorch and who don’t want ultra-low-level autograd-from-scratch material.
Others share additional resources (e.g., other “GPT from scratch” writeups/videos) that complement this, each emphasizing different aspects (training vs inference, numpy-level math vs framework use).

Data cleaning, instruction following, and real-world models

Some ask for more detail on how major models clean and structure training data, suggesting this is where long-term differentiation will lie.
Commenters point to sections in large model papers (e.g., “steerability” / instruction tuning) as partial answers.
One thread stresses that unstructured pretraining alone yields a babbling model; instruction-following behavior requires additional structured training with human feedback.

“From scratch” and abstraction level debate

Significant discussion centers on whether building an LLM “from the ground up” should use PyTorch or go lower-level (numpy, custom autograd, or even C/assembly).
One camp: PyTorch nn is “low level enough” for understanding transformers; going deeper is mostly for framework/hardware developers.
Another camp: “from scratch” should avoid major dependencies and expose more of the mechanics; they cite bottom-up tutorials (e.g., autograd by hand) as more educational.
Some propose a pedagogical progression: basic programming → text processing → n‑grams/Markov chains → then transformers.

Should people build their own LLMs?

Skeptical voices argue most individuals can’t train competitive models and should focus on building applications on top of existing LLMs.
Others counter that educational value, intuition-building, and niche/small models on modest hardware still justify learning to build and train models.

Alternative/simple language models and terminology

A long subthread debates a non-LLM “transformer” project based on n‑grams/Markov chains plus rules.
Critics say calling it a “transformer” is misleading in today’s NLP context, where that term refers to a specific architecture.
The author defends the broader mathematical meaning of “transform/transformer” and argues that n‑grams, POS tagging, and embeddings are intertwined in modern systems.
Multiple commenters push back that terminology in ML has become specialized and reuse of core terms can confuse users.

Platform and tooling notes

Some Windows users wonder about compatibility; others recommend WSL2 with CUDA as a practical route.
A separate guide is mentioned for training nanoGPT on cloud GPUs for relatively low cost, though its practical utility is described as mostly educational.

Language around “coding”

A minor tangent discusses dislike for the term “coding” versus “programming” or “software engineering.”
Views differ by culture and personal taste; some see “coder” as less professional, others embrace it as long-standing slang.

Related topics