2026-05-22

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Antigravity 2.0 product experience & Google UX

Many commenters find Antigravity 2.0 rough: login via browser every time for some, broken credential caching when no keyring/dbus is present, and poor TUI behavior during streaming output.
Others report that credential caching works and like it more than Gemini CLI, suggesting OS‑level differences.
Fragmented billing, abrupt quota changes, and the forced migration from Gemini CLI to Antigravity generate strong frustration and loss of trust.
Several people think Google “shipped the org chart”: too many overlapping AI products, none clearly best‑in‑class.

CLI, IDE, workflows, and lock‑in

Some don’t understand complaints about dropping the VS Code plugin, arguing Antigravity CLI + any editor is enough.
Others strongly value stable workflows and avoid tools that disrupt their editor/IDE setup or increase vendor lock‑in.
One insider describes Antigravity 2.0’s desktop app as an “agent management” shell around the CLI, not a full IDE, which some users felt was under‑communicated.

Billing, quotas, and reliability

Multiple reports of surprise quota cuts on paid AI plans, sometimes applied retroactively with long reset periods, causing cancellations.
Users worry Google will sunset Antigravity or change terms again; others say switching tools is cheap so sunsetting isn’t a big deal.

Benchmark design and interpretation

Several argue the Pantheon test is not a robust benchmark: it’s one highly specific model, evaluated subjectively.
Others note it still exposes interesting differences between tools and shows how harness/agent design matters, not just the base model.
Some question whether the model is retrieving prior knowledge of the Pantheon (e.g., interior coffers) rather than inferring from images alone.
One commenter stresses that models are “jagged”: performance varies widely by task; a single example proves little.

CAD / OpenSCAD workflows

Many describe success using LLMs to generate OpenSCAD or similar code for simple functional parts, enclosures, and parametric libraries.
Others report failures on more complex, constraint‑driven or tolerance‑sensitive tasks (snap‑fits, vase mode, Fusion assistant).
There’s debate over OpenSCAD’s suitability (e.g., lack of true curves) and whether more professional kernels like CadQuery/FreeCAD are better aligned with engineering workflows.
Using reference images is widely seen as a big improvement over text‑only prompts.

General vs specialized models

Some argue general multimodal models are inherently better for tasks like “model the Pantheon” because they need broad world knowledge.
Others wonder whether specialized CAD models combined with general models might do better; consensus is unclear.

LLMs, learning, and expectations

Several note that LLMs dramatically lower the barrier to learning CAD, Nix, new languages, etc., turning “too hard” skills into “just try it.”
There’s meta‑discussion about rapidly rising expectations: what was “magic” a few years ago is now nitpicked, yet many still express awe.
Examples like basic arithmetic failures (e.g., 300+140) are cited as evidence of jagged, non‑deterministic reasoning that limits trust for critical tasks.

Related topics