Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Antigravity 2.0 product experience & Google UX

  • Many commenters find Antigravity 2.0 rough: login via browser every time for some, broken credential caching when no keyring/dbus is present, and poor TUI behavior during streaming output.
  • Others report that credential caching works and like it more than Gemini CLI, suggesting OS‑level differences.
  • Fragmented billing, abrupt quota changes, and the forced migration from Gemini CLI to Antigravity generate strong frustration and loss of trust.
  • Several people think Google “shipped the org chart”: too many overlapping AI products, none clearly best‑in‑class.

CLI, IDE, workflows, and lock‑in

  • Some don’t understand complaints about dropping the VS Code plugin, arguing Antigravity CLI + any editor is enough.
  • Others strongly value stable workflows and avoid tools that disrupt their editor/IDE setup or increase vendor lock‑in.
  • One insider describes Antigravity 2.0’s desktop app as an “agent management” shell around the CLI, not a full IDE, which some users felt was under‑communicated.

Billing, quotas, and reliability

  • Multiple reports of surprise quota cuts on paid AI plans, sometimes applied retroactively with long reset periods, causing cancellations.
  • Users worry Google will sunset Antigravity or change terms again; others say switching tools is cheap so sunsetting isn’t a big deal.

Benchmark design and interpretation

  • Several argue the Pantheon test is not a robust benchmark: it’s one highly specific model, evaluated subjectively.
  • Others note it still exposes interesting differences between tools and shows how harness/agent design matters, not just the base model.
  • Some question whether the model is retrieving prior knowledge of the Pantheon (e.g., interior coffers) rather than inferring from images alone.
  • One commenter stresses that models are “jagged”: performance varies widely by task; a single example proves little.

CAD / OpenSCAD workflows

  • Many describe success using LLMs to generate OpenSCAD or similar code for simple functional parts, enclosures, and parametric libraries.
  • Others report failures on more complex, constraint‑driven or tolerance‑sensitive tasks (snap‑fits, vase mode, Fusion assistant).
  • There’s debate over OpenSCAD’s suitability (e.g., lack of true curves) and whether more professional kernels like CadQuery/FreeCAD are better aligned with engineering workflows.
  • Using reference images is widely seen as a big improvement over text‑only prompts.

General vs specialized models

  • Some argue general multimodal models are inherently better for tasks like “model the Pantheon” because they need broad world knowledge.
  • Others wonder whether specialized CAD models combined with general models might do better; consensus is unclear.

LLMs, learning, and expectations

  • Several note that LLMs dramatically lower the barrier to learning CAD, Nix, new languages, etc., turning “too hard” skills into “just try it.”
  • There’s meta‑discussion about rapidly rising expectations: what was “magic” a few years ago is now nitpicked, yet many still express awe.
  • Examples like basic arithmetic failures (e.g., 300+140) are cited as evidence of jagged, non‑deterministic reasoning that limits trust for critical tasks.