Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark
Antigravity 2.0 product experience & Google UX
- Many commenters find Antigravity 2.0 rough: login via browser every time for some, broken credential caching when no keyring/dbus is present, and poor TUI behavior during streaming output.
- Others report that credential caching works and like it more than Gemini CLI, suggesting OS‑level differences.
- Fragmented billing, abrupt quota changes, and the forced migration from Gemini CLI to Antigravity generate strong frustration and loss of trust.
- Several people think Google “shipped the org chart”: too many overlapping AI products, none clearly best‑in‑class.
CLI, IDE, workflows, and lock‑in
- Some don’t understand complaints about dropping the VS Code plugin, arguing Antigravity CLI + any editor is enough.
- Others strongly value stable workflows and avoid tools that disrupt their editor/IDE setup or increase vendor lock‑in.
- One insider describes Antigravity 2.0’s desktop app as an “agent management” shell around the CLI, not a full IDE, which some users felt was under‑communicated.
Billing, quotas, and reliability
- Multiple reports of surprise quota cuts on paid AI plans, sometimes applied retroactively with long reset periods, causing cancellations.
- Users worry Google will sunset Antigravity or change terms again; others say switching tools is cheap so sunsetting isn’t a big deal.
Benchmark design and interpretation
- Several argue the Pantheon test is not a robust benchmark: it’s one highly specific model, evaluated subjectively.
- Others note it still exposes interesting differences between tools and shows how harness/agent design matters, not just the base model.
- Some question whether the model is retrieving prior knowledge of the Pantheon (e.g., interior coffers) rather than inferring from images alone.
- One commenter stresses that models are “jagged”: performance varies widely by task; a single example proves little.
CAD / OpenSCAD workflows
- Many describe success using LLMs to generate OpenSCAD or similar code for simple functional parts, enclosures, and parametric libraries.
- Others report failures on more complex, constraint‑driven or tolerance‑sensitive tasks (snap‑fits, vase mode, Fusion assistant).
- There’s debate over OpenSCAD’s suitability (e.g., lack of true curves) and whether more professional kernels like CadQuery/FreeCAD are better aligned with engineering workflows.
- Using reference images is widely seen as a big improvement over text‑only prompts.
General vs specialized models
- Some argue general multimodal models are inherently better for tasks like “model the Pantheon” because they need broad world knowledge.
- Others wonder whether specialized CAD models combined with general models might do better; consensus is unclear.
LLMs, learning, and expectations
- Several note that LLMs dramatically lower the barrier to learning CAD, Nix, new languages, etc., turning “too hard” skills into “just try it.”
- There’s meta‑discussion about rapidly rising expectations: what was “magic” a few years ago is now nitpicked, yet many still express awe.
- Examples like basic arithmetic failures (e.g., 300+140) are cited as evidence of jagged, non‑deterministic reasoning that limits trust for critical tasks.