Mistral AI Releases Forge

Overall reaction to Forge

  • Many see Forge as an interesting, “smart” business move: bespoke and domain-specific models instead of competing at the absolute frontier.
  • Several are disappointed it’s “contact us” only, with no public pricing, no signup, no sample scripts/notebooks, and a very enterprise‑centric posture.
  • Some small‑company developers say tools like Forge make training and fine‑tuning feel more attainable than before.

Pretraining, fine-tuning, and RAG

  • Confusion over terminology: people debate what Mistral calls “pretraining” vs “post‑training”:
    • Likely “continued pretraining” on domain text plus SFT/RLHF, not training from scratch.
    • Some suggest the distinction may be full fine‑tuning vs lightweight PEFT/LoRA.
  • Multiple posters question when pretraining/fine‑tuning is actually needed versus RAG.
  • One commenter declares “RAG is dead,” but several others strongly push back, saying retrieval (including vector search) is widely used and will remain important.

Model quality and product experience

  • Opinions on quality are split:
    • Some consider Mistral underrated, cost‑effective, good for philosophical depth, OCR, and local use.
    • Others call the models “bottom floor” and say any frontier US model is better.
  • OCR quality is debated: some praise Mistral OCR (especially v3), others report worse results than Claude on earlier versions.
  • Many complain about confusing model naming (e.g., Devstral variants), inconsistent docs, and fragmented API keys, reinforcing the sense that individual developers are not the main target.

Mistral’s strategic positioning (EU & enterprise)

  • Strong theme: Mistral as the “EU‑friendly” alternative, with data staying in the EU and self‑hosting options.
  • Some argue this non‑US status is a real moat for regulated sectors and European sovereignty concerns; others say most big EU companies still choose US models and that sovereignty talk often outpaces action.
  • There’s concern that Mistral still relies on US cloud providers, so political risk and “pull‑the‑plug” scenarios remain.

Specialization, enterprise data, and technical challenges

  • Several see the future in specialized, mid‑sized models (fast, local, domain‑tuned) rather than ever‑larger general models; others argue general SOTA + good tooling is winning today.
  • Some believe proprietary enterprise data could be a strong moat; skeptics reply that real internal knowledge is messy, incomplete, and often lives in code and people, not clean documents.
  • Discussion touches on:
    • RL environments being hard to design correctly.
    • Continuous learning via external knowledge bases and better “context efficiency” rather than constant retraining.