Tongyi DeepResearch – open-source 30B MoE Model that rivals OpenAI DeepResearch
What “Deep Research” Means in Practice
- Commenters see “deep research” as a generic pattern now: long-running, search-driven tasks that churn for minutes and return a sourced report.
- Several stress that outcomes depend heavily on the underlying model plus tooling; comparing “DeepResearch” vs “Deep research” is meaningless without knowing what base model is used.
- Tongyi’s system is viewed as a fine-tuned “research agent” tuned to drive a search tool in loops and write reports.
Model Architecture and Specialization
- Tongyi DeepResearch is identified as a Qwen3 30B MoE fine-tune (≈3B active parameters per token, similar to other A3B MoEs).
- Debate over whether MoE implies domain experts: most current MoEs are trained as generalist experts, though there is research into domain-focused MoE routing.
- Some predict more purpose-trained models as frontier improvements slow; others argue general frontier models still dominate and specialized fine-tunes mainly matter for cost/latency and robustness in narrow domains (e.g., robotics, legal/compliance).
Self‑Hosting and Hardware Discussions
- Many practical tips: run via llama.cpp or vLLM/sglang; Ollama or LM Studio for quick Mac setups; use quantized GGUF variants to fit 30B MoE on modest GPUs/CPUs.
- Example rigs range from dual 3090 PCs to MacBook Pros with 64–128 GB unified memory; older CPUs plus midrange GPUs can still drive 30B MoE at acceptable speeds.
- Suggestions for cheap high‑VRAM setups include older AMD MI50 cards under ROCm. Llama.cpp’s CPU-offloaded MoE experts are mentioned as a way to stretch smaller GPUs.
Usefulness of Deep Research Tools
- Mixed experiences: many find outputs bland and surface-level, mostly structured summaries of what search already shows.
- Still, several use them heavily for: market overviews, academic “has this been done?”, legal/legislative summaries, product selection, and as a way to quickly discover relevant sources.
- A repeated pattern: users skim or ignore the prose and focus on cited links, treating the agent as a smarter, iterative meta-search layer.
- Some prefer scripting their own “search + scrape + summarize” pipelines, arguing deterministic code is more reliable than fully agentic loops.
Open Source, Competition, and OpenAI’s Moat
- Thread highlights how many high-quality open and paid alternatives now exist; some feel OpenAI’s moat is thin and commoditization of the “compute layer” is underway.
- Others argue OpenAI still retains advantages: accumulated know‑how, talent, and the ChatGPT brand, plus enterprise/government sales channels.
- Several emphasize UX and orchestration (context management, long-tool-call chains, agents) as the real differentiators rather than the raw model alone.
China’s Position in AI
- Strong appreciation for Qwen3, DeepSeek, and other Chinese open models; they are seen as close to US models and very attractive for local deployment.
- Some argue China’s open releases help them capture mindshare among tinkerers and students; others note they still tend to lag US frontier models by months.
- A side debate touches on distillation from Western models and on whether being “first” in AI is actually a disadvantage when others can cheaply distill.
UX, Agents, and Limitations
- Multiple people note LLM agents are poor at exhaustive, repetitive tasks (e.g., processing 300 links); they often skip items or stop early, likely due to training and context/usage limits.
- Workarounds: have the model write code to call itself iteratively; or design external orchestrators that enforce coverage and constraints.
- Some see large opportunity in better “session-level” control and constraint handling, not just stronger base models.
Miscellaneous
- The Tongyi blog’s CSS/Unicode choices (non‑breaking spaces plus word-break rules) make the page hard to read on some devices; one commenter posts a JS snippet to fix spacing client-side.
- Tongyi DeepResearch is available via OpenRouter (including a free tier), and there are already smaller distilled versions (e.g., Qwen3 4B) built from it for lighter local use.