2026-03-04

Qwen3.5 Fine-Tuning Guide

Real‑world fine‑tuning use cases

Document classification and attribute extraction (doc type, year, subject) where small/medium models nearly match larger models but at much lower cost.
Labeling/categorization and data extraction, especially converting semi‑structured inputs (e.g., receipts, documents) into strict JSON or schemas.
Company‑specific models: internal knowledge bases, codebases, legal corpora, function-calling over internal APIs, and generalized attribute extraction for commerce.
Vision + text: flood detection, handwriting recognition, receipt understanding, broader multimodal adaptation.
Style and domain adaptation: personal “voice” for emails/forum posts, low‑resource languages, and highly idiosyncratic prose.
Niche or sensitive domains where base models were filtered (e.g., porn content).
Embedded/on‑device scenarios: tiny quantized models for games, robotics, and offline/air‑gapped systems.

Data, cost, and performance

Several comments say a “few thousand” good examples can substantially improve small models; one user reports strong results with ~1,000 examples.
Fine‑tuned small models are claimed to deliver ~10x fewer errors at ~100x lower inference cost than frontier APIs in some enterprise extraction tasks.
Batch workloads over ~100k items and repeated reruns are highlighted as especially favorable for self‑hosted fine‑tuned models.

Debate: fine‑tuning vs prompting/RAG/tools

Skeptical view: modern LLMs plus large context, tools, and RAG usually obviate fine‑tuning; especially for changing knowledge bases where RAG avoids retraining.
Counter‑view:
- Context is limited and competes with task input.
- Large models are expensive and slow with big contexts.
- Fine‑tuned small models give cheaper, faster, more deterministic, and less “distracted” behavior, especially for tightly scoped tasks and structured outputs.
- Some domains (OOD data, new modalities, continual learning, strong style transfer) still clearly benefit.

Techniques and tooling

Heavy emphasis on parameter‑efficient methods: LoRA, QLoRA, prefix tuning, GRPO/RL, doc‑to‑LoRA, model routing.
Some friction with bitsandbytes and newer MoE/linear‑attention architectures; suggestions to train LoRA over GGUF bases.
Function‑calling finetunes are cited as particularly powerful compared to pure JSON prompting.

Qwen‑specific notes

Appreciation for the fine‑tuning guide, but concern that it currently focuses on larger MoE models; smaller and new 9B hybrid‑Mamba variants may need special treatment.
Some worry about recent leadership changes potentially affecting Qwen’s open‑source direction.

Related topics