My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

Training data, cloning, and originality

  • Many argue the model likely saw numerous Space Invaders clones in training, so the result may be sophisticated “copy–paste with extra steps” rather than invention.
  • Others counter that humans also recombine prior knowledge, and that models demonstrably handle entirely new requirements when given detailed specs.
  • Debate centers on whether LLMs are “just recall”:
    • Critics say output is mostly lossy compression of training data with limited true reasoning.
    • Supporters point to compression itself as a powerful form of understanding, plus hallucinations as evidence it’s not literal memorization.
  • Some small-scale code comparisons show similarity in structure and idioms but not verbatim copying, suggesting reuse of patterns rather than wholesale plagiarism.

Benchmarks, pelicans, and artist concerns

  • The long‑running “SVG pelican on a bicycle” prompt is discussed as a benchmark that models may now be overfitting on, especially as it went viral.
  • This leads to a broader point: public benchmarks get “burned” as soon as labs can train/cheat on them, motivating people to keep private test sets.
  • Artists worry that anything put online becomes training data and is commoditized; suggestions include physical exhibitions or DRM’d portfolios, but consensus is that DRM would be brittle and easily bypassed.

Local models and hardware (Apple vs others)

  • A big theme is how impressive it is that an M2/M4 Mac with 64–128GB unified memory can run ~200B‑parameter MoE models locally and generate full games.
  • Disagreement over how “exceptional” that hardware is: common for high‑end Macs, but far above typical consumer laptops.
  • On PCs, running comparable models usually requires 24–48GB+ of GPU VRAM or slow CPU inference; unified memory gives Macs an advantage for large models.
  • Alternatives include multi‑GPU rigs, high‑RAM EPYC servers, new AMD Strix Halo / Framework Desktop, or simply renting GPUs from cloud providers.

Capabilities and limits of LLM coding

  • Commenters note that LLMs excel at well‑trodden tasks (classic tutorials, boilerplate, UI patterns) but often struggle with novel, idiosyncratic problems and unfamiliar platforms.
  • Some find “agentic coding” magical yet fragile: great for simple greenfield projects, frustrating for evolving real codebases without tests.
  • Others describe large productivity gains for glue code, obscure tools (e.g., ffmpeg, jq, AppleScript), quick throwaway utilities, and educational explanations.
  • Several emphasize disciplined workflows: small iterative prompts, unit tests, and line‑by‑line review; otherwise quality, performance, and security can suffer.

Open vs closed models, fine‑tuning, and economics

  • Open models are seen as astonishingly strong and only ~6 months behind top proprietary labs, with rapid progress (LLaMA leak onward).
  • Some speculate this erodes moats of providers like Anthropic/OpenAI, but others note:
    • High‑end cloud models still outperform local ones and are cheaper than buying/operating powerful hardware for most users.
    • Many expect a database‑like landscape: a mix of strong open models and premium proprietary ones.
  • Fine‑tuning/LoRA: tools like peft, Unsloth, Axolotl, MLX are recommended; but multiple comments warn that naïve finetuning can degrade general capabilities, and is best for narrow tasks or downsizing to small specialized models.

Use cases, local adoption, and “real engineering”

  • Some argue a Space Invaders clone isn’t representative of “real engineering” because requirements are fully known and heavily represented in training data. Others respond that implementing it still involves genuine engineering patterns.
  • Local LLMs are compared to Linux: valuable to enthusiasts, students, and developers who want privacy, low latency, or offline use, while most people will likely stay on SaaS.
  • There is ongoing concern about overhyping capabilities, but also recognition that even “merely remixing” models are already changing workflows and expanding what individuals can build.