Falcon 2
Benchmarks and Model Quality
- Falcon 2 11B is claimed (by its creators) to slightly outperform Llama 3 8B and match Gemma 7B on Hugging Face benchmark averages.
- Several commenters find this odd, saying Llama 3 8B generally outperforms Gemma 7B in their experience and suspect benchmark quirks or contamination.
- Others note the comparison is among base models; chat-tuned Llama 3 is seen as much better than Gemma chat, which may explain perception gaps.
- There is skepticism about benchmarking practice: 11B vs 7–8B is not a fair “same class” comparison; automated benchmarks can be misleading; Falcon only narrowly wins on one metric.
Licensing and “Openness” Concerns
- The Falcon 2 11B license is a modified Apache 2 requiring compliance with an Acceptable Use Policy (AUP) that can change unilaterally.
- Commenters argue this undermines the claim of being open source and creates ongoing legal risk, since conditions can shift without notice.
- Debate over enforceability: some think retroactively changing terms for already-distributed weights is likely unenforceable or contradictory; others say the explicit clause may stand but is too risky for serious users.
- Falcon 1 had earlier license “shenanigans”; Falcon 1 40B is Apache-licensed but seen as obsolete.
Comparisons and Real‑World Use
- Anecdotes: Llama 3 8B is widely praised as “exceptionally good for its size,” Gemma 7B chat often judged weak; CodeGemma, however, is considered strong for coding.
- Performance parity of Falcon 2 11B with Llama 3 8B and Mistral 7B despite more parameters is seen as underwhelming.
- Earlier Falcon models (e.g., 180B) are recalled as heavily hyped but underperforming smaller open models.
Training Setup and Technical Notes
- Model card: trained on 1024 A100 40GB GPUs for ~2 months with 3D parallelism and FlashAttention 2.
- Stats cited: ~5T training tokens versus ~15T reported for Llama 3; some doubt extra parameters can offset fewer tokens.
Marketing, Positioning, and Geopolitics
- Press claims like “only AI model with Vision-to-Language capabilities” are called out as clearly false given GPT‑4V, Claude, Gemini, LLaVA, etc.
- “Outperforms Llama 3” is viewed as clickbait, especially without addressing Llama 3 70B or LMSYS-style human preference rankings.
- Some see Falcon as a sovereignty/prestige/media project for the UAE rather than a purely commercial play, with mixed reactions to AI being developed by non-democratic states.