2024-05-13

Falcon 2

Benchmarks and Model Quality

Falcon 2 11B is claimed (by its creators) to slightly outperform Llama 3 8B and match Gemma 7B on Hugging Face benchmark averages.
Several commenters find this odd, saying Llama 3 8B generally outperforms Gemma 7B in their experience and suspect benchmark quirks or contamination.
Others note the comparison is among base models; chat-tuned Llama 3 is seen as much better than Gemma chat, which may explain perception gaps.
There is skepticism about benchmarking practice: 11B vs 7–8B is not a fair “same class” comparison; automated benchmarks can be misleading; Falcon only narrowly wins on one metric.

Licensing and “Openness” Concerns

The Falcon 2 11B license is a modified Apache 2 requiring compliance with an Acceptable Use Policy (AUP) that can change unilaterally.
Commenters argue this undermines the claim of being open source and creates ongoing legal risk, since conditions can shift without notice.
Debate over enforceability: some think retroactively changing terms for already-distributed weights is likely unenforceable or contradictory; others say the explicit clause may stand but is too risky for serious users.
Falcon 1 had earlier license “shenanigans”; Falcon 1 40B is Apache-licensed but seen as obsolete.

Comparisons and Real‑World Use

Anecdotes: Llama 3 8B is widely praised as “exceptionally good for its size,” Gemma 7B chat often judged weak; CodeGemma, however, is considered strong for coding.
Performance parity of Falcon 2 11B with Llama 3 8B and Mistral 7B despite more parameters is seen as underwhelming.
Earlier Falcon models (e.g., 180B) are recalled as heavily hyped but underperforming smaller open models.

Training Setup and Technical Notes

Model card: trained on 1024 A100 40GB GPUs for ~2 months with 3D parallelism and FlashAttention 2.
Stats cited: ~5T training tokens versus ~15T reported for Llama 3; some doubt extra parameters can offset fewer tokens.

Marketing, Positioning, and Geopolitics

Press claims like “only AI model with Vision-to-Language capabilities” are called out as clearly false given GPT‑4V, Claude, Gemini, LLaVA, etc.
“Outperforms Llama 3” is viewed as clickbait, especially without addressing Llama 3 70B or LMSYS-style human preference rankings.
Some see Falcon as a sovereignty/prestige/media project for the UAE rather than a purely commercial play, with mixed reactions to AI being developed by non-democratic states.

Related topics