Qwen3.5 Fine-Tuning Guide

Real‑world fine‑tuning use cases

  • Document classification and attribute extraction (doc type, year, subject) where small/medium models nearly match larger models but at much lower cost.
  • Labeling/categorization and data extraction, especially converting semi‑structured inputs (e.g., receipts, documents) into strict JSON or schemas.
  • Company‑specific models: internal knowledge bases, codebases, legal corpora, function-calling over internal APIs, and generalized attribute extraction for commerce.
  • Vision + text: flood detection, handwriting recognition, receipt understanding, broader multimodal adaptation.
  • Style and domain adaptation: personal “voice” for emails/forum posts, low‑resource languages, and highly idiosyncratic prose.
  • Niche or sensitive domains where base models were filtered (e.g., porn content).
  • Embedded/on‑device scenarios: tiny quantized models for games, robotics, and offline/air‑gapped systems.

Data, cost, and performance

  • Several comments say a “few thousand” good examples can substantially improve small models; one user reports strong results with ~1,000 examples.
  • Fine‑tuned small models are claimed to deliver ~10x fewer errors at ~100x lower inference cost than frontier APIs in some enterprise extraction tasks.
  • Batch workloads over ~100k items and repeated reruns are highlighted as especially favorable for self‑hosted fine‑tuned models.

Debate: fine‑tuning vs prompting/RAG/tools

  • Skeptical view: modern LLMs plus large context, tools, and RAG usually obviate fine‑tuning; especially for changing knowledge bases where RAG avoids retraining.
  • Counter‑view:
    • Context is limited and competes with task input.
    • Large models are expensive and slow with big contexts.
    • Fine‑tuned small models give cheaper, faster, more deterministic, and less “distracted” behavior, especially for tightly scoped tasks and structured outputs.
    • Some domains (OOD data, new modalities, continual learning, strong style transfer) still clearly benefit.

Techniques and tooling

  • Heavy emphasis on parameter‑efficient methods: LoRA, QLoRA, prefix tuning, GRPO/RL, doc‑to‑LoRA, model routing.
  • Some friction with bitsandbytes and newer MoE/linear‑attention architectures; suggestions to train LoRA over GGUF bases.
  • Function‑calling finetunes are cited as particularly powerful compared to pure JSON prompting.

Qwen‑specific notes

  • Appreciation for the fine‑tuning guide, but concern that it currently focuses on larger MoE models; smaller and new 9B hybrid‑Mamba variants may need special treatment.
  • Some worry about recent leadership changes potentially affecting Qwen’s open‑source direction.