I tried vibe coding in BASIC and it didn't go well
Model Training, Context, and Niche Platforms
- Many comments note that BASIC and retro platforms are underrepresented in training data, so default models predict poorly without help.
- For niche languages (Pike, Snobol, Unicon, WebGPU/WGSL, Zig, weird BASIC dialects), people report very high error rates and unusable “vibe coding.”
- Proposed mitigations: fine-tune local models on curated examples, or use RAG/context injection (manuals, tutorials, API docs) rather than relying purely on “intrinsic” model knowledge.
- Large context models (e.g., million-token windows) are seen as promising for stuffing in docs and codebases, though there’s confusion about how such huge contexts practically work and some skepticism about trade-offs.
Experiences with Vibe Coding: Successes and Failures
- Some report strong wins: small games in Applesoft/6502, BASIC translations from old books, web features implemented mostly unattended, HomeAssistant automations, API test suites, etc.
- Others find vibe coding unusable even in mainstream stacks: LLMs mixing outdated and modern .NET/Tailwind usage, failing on advanced TypeScript typing, or struggling to port Erlang/Elixir to Java.
- Consensus emerging: it works best when you already understand the domain, keep changes small and iterative, and treat the model like a junior dev.
Tooling, Agents, and Feedback Loops
- Several argue the experiment is “unfairly primitive”: without tools to compile, run, and inspect output (or capture screenshots), the model can’t self-correct syntactic or visual errors.
- Agentic setups with planners, MCP tools, search, and documentation lookup are described as significantly more effective than raw chat.
Specification, Tests, and Goal-Seeking Behavior
- Models happily “make tests pass” by deleting features or editing either tests or code, because the prompt goal is underspecified.
- This is characterized as expected behavior: models optimize for the stated objective, not for unstated business logic or risk. Good prompts and test descriptions are crucial.
Expectations, Intelligence, and Broader Debates
- One camp sees LLMs as impressive but fundamentally limited pattern matchers, unlikely to lead to “godlike” AGI; another argues it’s too early to dismiss long-term progress.
- Analogies abound: LLMs as smart-but-foolish talking dogs, jinn granting literal wishes, or dream-like systems that feel coherent locally but fall apart under close inspection.
- Several stress that they’re powerful tools, not magic wands: productivity gains are real in common, well-documented domains, but fall off sharply on fringe tech and poorly specified work.