Mercury: Ultra-fast language models based on diffusion

Hands-on impressions (speed, UX, behavior)

  • Many commenters tried the playground and described it as “insanely fast” or “almost instantaneous,” with full paragraphs appearing at once.
  • The “diffusion mode” visualization is seen as a neat but purely cosmetic animation, not a faithful view of internal steps.
  • Some report deterministic behavior even at higher temperatures, needing hacks (e.g., adding a UUID to the prompt) to get varied outputs.

Quality, correctness, and hallucinations

  • Mixed reactions on capability: some found it “quite smart” and good for quick coding help or small utilities (e.g., MQTT matcher), others saw “over 60% hallucinations” and weak logical reasoning (fails “stRawbeRRy” / Sally’s sister tests).
  • In coding, it can produce plausible but non-compiling or incorrect answers, similar to earlier LLM generations.
  • Weird failure modes noted: infinite-ish test generation for a regex prompt, deteriorating test quality, nonsense characters, and classic issues like misunderstanding bit shifts and 128-bit integers.
  • Several commenters stress that raw token prediction alone is not enough for reliability in code.

Diffusion vs. autoregressive LLMs

  • Diffusion is framed as coarse-to-fine vs. the start-to-end bias of autoregressive models; this directional difference may affect how they handle editing and “coding flows.”
  • Some see diffusion as especially promising for back-and-forth editing, multi-layer code representations, and potentially schema- or type-constrained generation.
  • Others compare to Gemini Diffusion: very fast but currently weaker than top conventional models, suggesting this is an early-quality, high-speed phase.

Ecosystem, pricing, and openness

  • Mercury’s API pricing is seen as decent but not market-leading; Groq and DeepInfra are cited as cheaper for some workloads, though sometimes slower or higher latency.
  • Lack of open weights, undisclosed parameter counts, and a benchmark-heavy, light-on-details paper draw criticism; the arXiv tech report is viewed by some as bordering on marketing.
  • One commenter links Mercury to a scaled-up variant of existing discrete diffusion work and provides an educational reimplementation.

Impact on tooling, CI, and workflows

  • Many anticipate ultra-fast models enabling new paradigms (e.g., semantic grep over millions of HN comments, rapid multi-iteration code agents).
  • A long subthread argues that CI/testing—not model speed—will become the main bottleneck as agents generate far more code and PRs, prompting extensive discussion about CI cost, flakiness, caching, and architectural/test quality.