Mistral AI Releases Forge
Overall reaction to Forge
- Many see Forge as an interesting, “smart” business move: bespoke and domain-specific models instead of competing at the absolute frontier.
- Several are disappointed it’s “contact us” only, with no public pricing, no signup, no sample scripts/notebooks, and a very enterprise‑centric posture.
- Some small‑company developers say tools like Forge make training and fine‑tuning feel more attainable than before.
Pretraining, fine-tuning, and RAG
- Confusion over terminology: people debate what Mistral calls “pretraining” vs “post‑training”:
- Likely “continued pretraining” on domain text plus SFT/RLHF, not training from scratch.
- Some suggest the distinction may be full fine‑tuning vs lightweight PEFT/LoRA.
- Multiple posters question when pretraining/fine‑tuning is actually needed versus RAG.
- One commenter declares “RAG is dead,” but several others strongly push back, saying retrieval (including vector search) is widely used and will remain important.
Model quality and product experience
- Opinions on quality are split:
- Some consider Mistral underrated, cost‑effective, good for philosophical depth, OCR, and local use.
- Others call the models “bottom floor” and say any frontier US model is better.
- OCR quality is debated: some praise Mistral OCR (especially v3), others report worse results than Claude on earlier versions.
- Many complain about confusing model naming (e.g., Devstral variants), inconsistent docs, and fragmented API keys, reinforcing the sense that individual developers are not the main target.
Mistral’s strategic positioning (EU & enterprise)
- Strong theme: Mistral as the “EU‑friendly” alternative, with data staying in the EU and self‑hosting options.
- Some argue this non‑US status is a real moat for regulated sectors and European sovereignty concerns; others say most big EU companies still choose US models and that sovereignty talk often outpaces action.
- There’s concern that Mistral still relies on US cloud providers, so political risk and “pull‑the‑plug” scenarios remain.
Specialization, enterprise data, and technical challenges
- Several see the future in specialized, mid‑sized models (fast, local, domain‑tuned) rather than ever‑larger general models; others argue general SOTA + good tooling is winning today.
- Some believe proprietary enterprise data could be a strong moat; skeptics reply that real internal knowledge is messy, incomplete, and often lives in code and people, not clean documents.
- Discussion touches on:
- RL environments being hard to design correctly.
- Continuous learning via external knowledge bases and better “context efficiency” rather than constant retraining.