Mercury: Ultra-fast language models based on diffusion
Hands-on impressions (speed, UX, behavior)
- Many commenters tried the playground and described it as “insanely fast” or “almost instantaneous,” with full paragraphs appearing at once.
- The “diffusion mode” visualization is seen as a neat but purely cosmetic animation, not a faithful view of internal steps.
- Some report deterministic behavior even at higher temperatures, needing hacks (e.g., adding a UUID to the prompt) to get varied outputs.
Quality, correctness, and hallucinations
- Mixed reactions on capability: some found it “quite smart” and good for quick coding help or small utilities (e.g., MQTT matcher), others saw “over 60% hallucinations” and weak logical reasoning (fails “stRawbeRRy” / Sally’s sister tests).
- In coding, it can produce plausible but non-compiling or incorrect answers, similar to earlier LLM generations.
- Weird failure modes noted: infinite-ish test generation for a regex prompt, deteriorating test quality, nonsense characters, and classic issues like misunderstanding bit shifts and 128-bit integers.
- Several commenters stress that raw token prediction alone is not enough for reliability in code.
Diffusion vs. autoregressive LLMs
- Diffusion is framed as coarse-to-fine vs. the start-to-end bias of autoregressive models; this directional difference may affect how they handle editing and “coding flows.”
- Some see diffusion as especially promising for back-and-forth editing, multi-layer code representations, and potentially schema- or type-constrained generation.
- Others compare to Gemini Diffusion: very fast but currently weaker than top conventional models, suggesting this is an early-quality, high-speed phase.
Ecosystem, pricing, and openness
- Mercury’s API pricing is seen as decent but not market-leading; Groq and DeepInfra are cited as cheaper for some workloads, though sometimes slower or higher latency.
- Lack of open weights, undisclosed parameter counts, and a benchmark-heavy, light-on-details paper draw criticism; the arXiv tech report is viewed by some as bordering on marketing.
- One commenter links Mercury to a scaled-up variant of existing discrete diffusion work and provides an educational reimplementation.
Impact on tooling, CI, and workflows
- Many anticipate ultra-fast models enabling new paradigms (e.g., semantic grep over millions of HN comments, rapid multi-iteration code agents).
- A long subthread argues that CI/testing—not model speed—will become the main bottleneck as agents generate far more code and PRs, prompting extensive discussion about CI cost, flakiness, caching, and architectural/test quality.