Qwen2 LLM Released
Tiny and Small Models (0.5B–3.8B)
- 0.5B Qwen2 model with 32k context is seen as interesting mainly as a finetuning / embedding base, not as a strong out-of-the-box chat model.
- Opinions diverge: some call sub-500M models “pretty much useless” for summarization; others report they work well when fine-tuned on classic NLP tasks (classification, labeling), potentially replacing BERT/RoBERTa/BART-style models.
- Suggested uses: speculative decoding to speed larger models; predictive keyboards; text completion; compression; OCR/speech disambiguation, where imperfect “hinting” is acceptable.
- Several note that summarization, especially over long context, is hard even for larger models.
Practical Use Cases for Small LLMs
- Emphasis on on-device, background automation rather than chat:
- Meeting transcription → summaries, key topics, action items, speaker attribution.
- Notification and note summarization, auto-titles, tag suggestions, context-aware quick replies.
- In-browser data extraction (e.g., job postings into structured fields) with larger models orchestrating smaller ones.
Performance, Benchmarks, and Comparisons
- Qwen2-72B is reported (by its authors) to outperform Llama 3 70B on many benchmarks; some call this plausible, others distrust self-reported numbers and prefer community leaderboards (e.g., LMsys Arena).
- Thread references newer benchmarks (MMLU-Pro, MixEval, Arena Hard, LiveCodeBench) to address saturation/overfitting in older tests.
- Debate over whether progress is plateauing: some say compute is the limiting factor; others point to unreleased larger models and continuing gains.
- Qwen2 MoE (57B weights, ~14B active) is seen as a strong “middle-size” option; comparisons drawn to Mixtral and Yi.
Licensing and “Open Source” Debate
- Praise for Apache 2.0 licensing on most Qwen2 models; 72B uses an older, more restrictive license but is still considered relatively permissive.
- Heated debate over calling such models “open source”:
- One side: models with Apache 2.0 weights are “open source” even if training data is closed.
- Other side: without open training data/recipe, these are “open weights” or “freeware,” not true open source.
- Some argue that open weights are still highly valuable for fine-tuning, interpretability, and model merging, even without full data transparency.
Censorship, Alignment, and Safety
- Users report errors or dropped responses when asking about Tiananmen Square and Chinese politics in hosted demos.
- Others note that local runs of the 7B model can answer these topics, suggesting censorship or instability in the online service rather than in the raw weights.
- Alignment around political topics appears inconsistent: sometimes refusals, sometimes partial or contradictory answers.
Training Infrastructure and Data Practices
- Curiosity about how Chinese companies train large models under GPU export restrictions; speculation includes legacy Nvidia GPUs, domestic accelerators (e.g., Huawei Ascend), and foreign data centers.
- It is noted that training pipelines often upweight certain data sources (e.g., internal emails, Wikipedia) via sampling frequency rather than “priority” at inference.
Model Proliferation and Architecture
- Some complain that many new LLMs are “the same thing” without architectural novelty, likening the situation to Linux distro fragmentation.
- Others counter that differences in architecture (e.g., GQA, MoE, context length) and licensing meaningfully expand options and are part of normal scientific/engineering iteration.