Phi 4 available on Ollama
Availability, Formats, and Bug Fixes
- Phi-4 is now an official Ollama model; community ports existed earlier, including versions with Unsloth’s bug fixes.
- Some GGUF builds on Hugging Face had inference errors due to Phi-4’s architecture diverging from Phi-3.5 while reusing the “phi3” identifier; Ollama’s build adjusts hyperparameters to avoid this.
- Users can pull GGUFs directly from Hugging Face into Ollama (e.g., specifying quantization like
:Q8_0), but nontrivial models (vision, special schemas) may need custom Modelfiles. - Future Ollama releases are expected to resolve the GGUF hyperparameter error generally.
Quality, Benchmarks, and Evaluation Methods
- Several users say earlier Phi models underperformed relative to benchmarks, but report Phi-4 (14B) as a major step up, “GPT‑4-class” for many tasks and strong in languages like Japanese.
- One benchmark on the top 1,000 StackOverflow questions ranked Phi-4 3rd, above GPT‑4 and Claude Sonnet 3.5, but it used Mixtral 8x7B as an automated judge, which is controversial.
- Critics argue LLM-as-judge tends to favor its own lineage and insist human evaluation is the only solid standard; others counter that LLM grading plus user votes is “good enough” for relative model ranking.
- Phi-4 scores relatively poorly on IFEval (instruction-following with strict constraints), flagged as a concern for constrained outputs.
- A separate case study shows Phi-4 can match GPT‑4o’s decisions ~97% of the time on a complex task when given high-quality few-shot examples, vs ~37% without few-shot.
Local Performance and Ecosystem
- Multiple users are “blown away” that GPT‑4-like models now run locally (e.g., on M1/M2/M3 Macs with ≥16 GB RAM), though speeds vary and some report issues (e.g., blank outputs on certain setups).
- Phi-4’s 14B size plus strong reasoning is seen as a turning point for practical local NLP, RAG, and coding assistance; compared favorably to Qwen 2/2.5 and Llama 3.3 70B.
- Some express dissatisfaction with Ollama/llama.cpp (limited multimodal support, no Vulkan in Ollama) and are exploring vLLM as an alternative.
Business, Strategy, and Licensing
- Phi-4 is MIT-licensed and available via OpenRouter, enabling cheap hosted access and easy self-hosting.
- Discussion suggests major cloud providers see models as increasingly commoditized and focus on infra and integrated products, contrasting with OpenAI’s more closed approach.
- Some view Microsoft’s open releases as a hedge against OpenAI and evidence that proprietary model moats are weak; others note these are “non-SOTA” but still strategically useful.
Technical Design, Training Data, and Legality
- Phi-4’s strong performance despite its size is attributed (per its technical report) to highly curated, largely synthetic data (textbooks, problem sets) instead of massive web dumps.
- This raises the question of whether training avoided copyright infringement; responses note that legality is unclear and may hinge on “fair use,” regardless of user perception.
Structured Outputs and Practical Use
- Ollama recently added structured output support; users report it works reasonably if schemas are simple, though not as robust as OpenAI-style constrained decoding.
- Third-party tools (e.g., BAML) are cited as improving JSON reliability across providers.
- Some minor quirks are noted (e.g., Markdown code fencing styles), possibly reflecting training data habits.
Broader Societal and Future Concerns
- Several comments marvel at the pace: powerful local models, high-quality image/video generation, and imminent voice-to-voice assistants.
- There is sharp disagreement on long-term impacts: some expect tools that augment humans; others foresee severe job displacement, social instability, and AI-enabled weapons development.
- Many anticipate AI becoming a generic “feature” in all products rather than a standalone destination, which may challenge API-centric businesses.