Falcon 2

Benchmarks and Model Quality

  • Falcon 2 11B is claimed (by its creators) to slightly outperform Llama 3 8B and match Gemma 7B on Hugging Face benchmark averages.
  • Several commenters find this odd, saying Llama 3 8B generally outperforms Gemma 7B in their experience and suspect benchmark quirks or contamination.
  • Others note the comparison is among base models; chat-tuned Llama 3 is seen as much better than Gemma chat, which may explain perception gaps.
  • There is skepticism about benchmarking practice: 11B vs 7–8B is not a fair “same class” comparison; automated benchmarks can be misleading; Falcon only narrowly wins on one metric.

Licensing and “Openness” Concerns

  • The Falcon 2 11B license is a modified Apache 2 requiring compliance with an Acceptable Use Policy (AUP) that can change unilaterally.
  • Commenters argue this undermines the claim of being open source and creates ongoing legal risk, since conditions can shift without notice.
  • Debate over enforceability: some think retroactively changing terms for already-distributed weights is likely unenforceable or contradictory; others say the explicit clause may stand but is too risky for serious users.
  • Falcon 1 had earlier license “shenanigans”; Falcon 1 40B is Apache-licensed but seen as obsolete.

Comparisons and Real‑World Use

  • Anecdotes: Llama 3 8B is widely praised as “exceptionally good for its size,” Gemma 7B chat often judged weak; CodeGemma, however, is considered strong for coding.
  • Performance parity of Falcon 2 11B with Llama 3 8B and Mistral 7B despite more parameters is seen as underwhelming.
  • Earlier Falcon models (e.g., 180B) are recalled as heavily hyped but underperforming smaller open models.

Training Setup and Technical Notes

  • Model card: trained on 1024 A100 40GB GPUs for ~2 months with 3D parallelism and FlashAttention 2.
  • Stats cited: ~5T training tokens versus ~15T reported for Llama 3; some doubt extra parameters can offset fewer tokens.

Marketing, Positioning, and Geopolitics

  • Press claims like “only AI model with Vision-to-Language capabilities” are called out as clearly false given GPT‑4V, Claude, Gemini, LLaVA, etc.
  • “Outperforms Llama 3” is viewed as clickbait, especially without addressing Llama 3 70B or LMSYS-style human preference rankings.
  • Some see Falcon as a sovereignty/prestige/media project for the UAE rather than a purely commercial play, with mixed reactions to AI being developed by non-democratic states.