Qwen3.5 Fine-Tuning Guide
Real‑world fine‑tuning use cases
- Document classification and attribute extraction (doc type, year, subject) where small/medium models nearly match larger models but at much lower cost.
- Labeling/categorization and data extraction, especially converting semi‑structured inputs (e.g., receipts, documents) into strict JSON or schemas.
- Company‑specific models: internal knowledge bases, codebases, legal corpora, function-calling over internal APIs, and generalized attribute extraction for commerce.
- Vision + text: flood detection, handwriting recognition, receipt understanding, broader multimodal adaptation.
- Style and domain adaptation: personal “voice” for emails/forum posts, low‑resource languages, and highly idiosyncratic prose.
- Niche or sensitive domains where base models were filtered (e.g., porn content).
- Embedded/on‑device scenarios: tiny quantized models for games, robotics, and offline/air‑gapped systems.
Data, cost, and performance
- Several comments say a “few thousand” good examples can substantially improve small models; one user reports strong results with ~1,000 examples.
- Fine‑tuned small models are claimed to deliver ~10x fewer errors at ~100x lower inference cost than frontier APIs in some enterprise extraction tasks.
- Batch workloads over ~100k items and repeated reruns are highlighted as especially favorable for self‑hosted fine‑tuned models.
Debate: fine‑tuning vs prompting/RAG/tools
- Skeptical view: modern LLMs plus large context, tools, and RAG usually obviate fine‑tuning; especially for changing knowledge bases where RAG avoids retraining.
- Counter‑view:
- Context is limited and competes with task input.
- Large models are expensive and slow with big contexts.
- Fine‑tuned small models give cheaper, faster, more deterministic, and less “distracted” behavior, especially for tightly scoped tasks and structured outputs.
- Some domains (OOD data, new modalities, continual learning, strong style transfer) still clearly benefit.
Techniques and tooling
- Heavy emphasis on parameter‑efficient methods: LoRA, QLoRA, prefix tuning, GRPO/RL, doc‑to‑LoRA, model routing.
- Some friction with bitsandbytes and newer MoE/linear‑attention architectures; suggestions to train LoRA over GGUF bases.
- Function‑calling finetunes are cited as particularly powerful compared to pure JSON prompting.
Qwen‑specific notes
- Appreciation for the fine‑tuning guide, but concern that it currently focuses on larger MoE models; smaller and new 9B hybrid‑Mamba variants may need special treatment.
- Some worry about recent leadership changes potentially affecting Qwen’s open‑source direction.