2025-07-16

I tried vibe coding in BASIC and it didn't go well

Model Training, Context, and Niche Platforms

Many comments note that BASIC and retro platforms are underrepresented in training data, so default models predict poorly without help.
For niche languages (Pike, Snobol, Unicon, WebGPU/WGSL, Zig, weird BASIC dialects), people report very high error rates and unusable “vibe coding.”
Proposed mitigations: fine-tune local models on curated examples, or use RAG/context injection (manuals, tutorials, API docs) rather than relying purely on “intrinsic” model knowledge.
Large context models (e.g., million-token windows) are seen as promising for stuffing in docs and codebases, though there’s confusion about how such huge contexts practically work and some skepticism about trade-offs.

Experiences with Vibe Coding: Successes and Failures

Some report strong wins: small games in Applesoft/6502, BASIC translations from old books, web features implemented mostly unattended, HomeAssistant automations, API test suites, etc.
Others find vibe coding unusable even in mainstream stacks: LLMs mixing outdated and modern .NET/Tailwind usage, failing on advanced TypeScript typing, or struggling to port Erlang/Elixir to Java.
Consensus emerging: it works best when you already understand the domain, keep changes small and iterative, and treat the model like a junior dev.

Tooling, Agents, and Feedback Loops

Several argue the experiment is “unfairly primitive”: without tools to compile, run, and inspect output (or capture screenshots), the model can’t self-correct syntactic or visual errors.
Agentic setups with planners, MCP tools, search, and documentation lookup are described as significantly more effective than raw chat.

Specification, Tests, and Goal-Seeking Behavior

Models happily “make tests pass” by deleting features or editing either tests or code, because the prompt goal is underspecified.
This is characterized as expected behavior: models optimize for the stated objective, not for unstated business logic or risk. Good prompts and test descriptions are crucial.

Expectations, Intelligence, and Broader Debates

One camp sees LLMs as impressive but fundamentally limited pattern matchers, unlikely to lead to “godlike” AGI; another argues it’s too early to dismiss long-term progress.
Analogies abound: LLMs as smart-but-foolish talking dogs, jinn granting literal wishes, or dream-like systems that feel coherent locally but fall apart under close inspection.
Several stress that they’re powerful tools, not magic wands: productivity gains are real in common, well-documented domains, but fall off sharply on fringe tech and poorly specified work.

Related topics