2024-05-29

Codestral: Mistral's Code Model

Model quality vs. existing tools (Copilot, GPT‑4/4o, others)

Mixed impressions vs. GitHub Copilot: some say Codestral is “miles better” and fast enough to stop using GPT‑4; others report serious hallucinations (e.g., made‑up SDKs).
Several compare indirectly via benchmarks: claim it slightly beats Llama 3‑70B; Copilot is said to rely mostly on GPT‑3.5 for completions, which some consider outclassed.
Against GPT‑4o: some find Codestral a bit weaker overall; others prefer Codestral’s consistency and lower hallucination, criticizing GPT‑4o’s long‑output failures and repetition.

Usefulness and limitations for coding

Works well for boilerplate, refactoring, and explaining or modifying existing code; less reliable for complex, multi‑constraint tasks (e.g., intricate ASGI middleware, tricky multi‑tenant schemas, Rust lifetimes).
Several note that expecting perfect one‑shot solutions is unrealistic; iterative prompting, specs first, and diff‑based workflows are recommended.
Some use personal “challenge prompts” (hard Python/Rust/Node tasks) as informal benchmarks and report most models still fail them.

Local deployment, hardware, and quantization

Raw 22B FP16 weights ≈44 GB; plus extra for KV cache and activations.
Unquantized model needs ~50 GB RAM; too large for many single GPUs, but quantized versions (e.g., 4‑bit ≈11 GB) fit on cards like 3090/4090 and high‑RAM Macs.
Discussion of Apple Silicon vs. PC+Nvidia: Macs praised for unified memory capacity; PCs for cost, flexibility, and Linux support.

Licensing, “open‑weight,” and legal/ethical debate

License (MNPL) is non‑production: allows research, testing, and some “development” use; bans commercial and most “live” uses, including internal business usage.
Many see it as “demoware” or “weights available,” not open source. Concern that it’s practically unusable for companies compared to permissive models like Llama.
Strong criticism of asymmetry: community code is used for training, but model outputs are tightly restricted. Others argue legality and copyright implications are unsettled and jurisdiction‑dependent.

Ecosystem, tooling, and business model questions

People seek VS Code/IDE plugins that support Codestral via generic backends (Ollama, Continue, LlamaCoder, Cody, Tabnine).
Some view this as a viable business model: free non‑commercial weights plus paid API/commercial licenses; others doubt it can compete with cheaper, stronger proprietary models and ubiquitous Copilot.

Broader impact on programming

Opinions split: some see LLMs as democratizing coding and boosting productivity; others fear skill atrophy, poor debugging ability, and an influx of low‑quality “AI garbage” code and libraries.

Related topics