I tried vibe coding in BASIC and it didn't go well

Model Training, Context, and Niche Platforms

  • Many comments note that BASIC and retro platforms are underrepresented in training data, so default models predict poorly without help.
  • For niche languages (Pike, Snobol, Unicon, WebGPU/WGSL, Zig, weird BASIC dialects), people report very high error rates and unusable “vibe coding.”
  • Proposed mitigations: fine-tune local models on curated examples, or use RAG/context injection (manuals, tutorials, API docs) rather than relying purely on “intrinsic” model knowledge.
  • Large context models (e.g., million-token windows) are seen as promising for stuffing in docs and codebases, though there’s confusion about how such huge contexts practically work and some skepticism about trade-offs.

Experiences with Vibe Coding: Successes and Failures

  • Some report strong wins: small games in Applesoft/6502, BASIC translations from old books, web features implemented mostly unattended, HomeAssistant automations, API test suites, etc.
  • Others find vibe coding unusable even in mainstream stacks: LLMs mixing outdated and modern .NET/Tailwind usage, failing on advanced TypeScript typing, or struggling to port Erlang/Elixir to Java.
  • Consensus emerging: it works best when you already understand the domain, keep changes small and iterative, and treat the model like a junior dev.

Tooling, Agents, and Feedback Loops

  • Several argue the experiment is “unfairly primitive”: without tools to compile, run, and inspect output (or capture screenshots), the model can’t self-correct syntactic or visual errors.
  • Agentic setups with planners, MCP tools, search, and documentation lookup are described as significantly more effective than raw chat.

Specification, Tests, and Goal-Seeking Behavior

  • Models happily “make tests pass” by deleting features or editing either tests or code, because the prompt goal is underspecified.
  • This is characterized as expected behavior: models optimize for the stated objective, not for unstated business logic or risk. Good prompts and test descriptions are crucial.

Expectations, Intelligence, and Broader Debates

  • One camp sees LLMs as impressive but fundamentally limited pattern matchers, unlikely to lead to “godlike” AGI; another argues it’s too early to dismiss long-term progress.
  • Analogies abound: LLMs as smart-but-foolish talking dogs, jinn granting literal wishes, or dream-like systems that feel coherent locally but fall apart under close inspection.
  • Several stress that they’re powerful tools, not magic wands: productivity gains are real in common, well-documented domains, but fall off sharply on fringe tech and poorly specified work.