2025-07-07

Mercury: Ultra-fast language models based on diffusion

Hands-on impressions (speed, UX, behavior)

Many commenters tried the playground and described it as “insanely fast” or “almost instantaneous,” with full paragraphs appearing at once.
The “diffusion mode” visualization is seen as a neat but purely cosmetic animation, not a faithful view of internal steps.
Some report deterministic behavior even at higher temperatures, needing hacks (e.g., adding a UUID to the prompt) to get varied outputs.

Quality, correctness, and hallucinations

Mixed reactions on capability: some found it “quite smart” and good for quick coding help or small utilities (e.g., MQTT matcher), others saw “over 60% hallucinations” and weak logical reasoning (fails “stRawbeRRy” / Sally’s sister tests).
In coding, it can produce plausible but non-compiling or incorrect answers, similar to earlier LLM generations.
Weird failure modes noted: infinite-ish test generation for a regex prompt, deteriorating test quality, nonsense characters, and classic issues like misunderstanding bit shifts and 128-bit integers.
Several commenters stress that raw token prediction alone is not enough for reliability in code.

Diffusion vs. autoregressive LLMs

Diffusion is framed as coarse-to-fine vs. the start-to-end bias of autoregressive models; this directional difference may affect how they handle editing and “coding flows.”
Some see diffusion as especially promising for back-and-forth editing, multi-layer code representations, and potentially schema- or type-constrained generation.
Others compare to Gemini Diffusion: very fast but currently weaker than top conventional models, suggesting this is an early-quality, high-speed phase.

Ecosystem, pricing, and openness

Mercury’s API pricing is seen as decent but not market-leading; Groq and DeepInfra are cited as cheaper for some workloads, though sometimes slower or higher latency.
Lack of open weights, undisclosed parameter counts, and a benchmark-heavy, light-on-details paper draw criticism; the arXiv tech report is viewed by some as bordering on marketing.
One commenter links Mercury to a scaled-up variant of existing discrete diffusion work and provides an educational reimplementation.

Impact on tooling, CI, and workflows

Many anticipate ultra-fast models enabling new paradigms (e.g., semantic grep over millions of HN comments, rapid multi-iteration code agents).
A long subthread argues that CI/testing—not model speed—will become the main bottleneck as agents generate far more code and PRs, prompting extensive discussion about CI cost, flakiness, caching, and architectural/test quality.

Related topics