DeepSeek Introduces Vision

Availability & API Support

  • Vision is now visible in the DeepSeek chat UI for many users, but there is no official announcement or documentation page yet.
  • Multiple commenters confirm it is not available via the API at this time; several say the lack of API vision is the main blocker for integrating DeepSeek into projects or spending more on it.
  • Some users report having the “vision” tab for months, but others clarify that older functionality was just OCR piped into a text-only model, whereas now the model natively accepts images.

Capabilities & Quality

  • Users testing with varied photos report that DeepSeek Vision is fast and generally accurate at understanding scenes, not just reading text.
  • It currently only analyzes images; no image editing/generation is mentioned.
  • Several people want it paired with other tools (e.g., Apple Vision frameworks, Playwright tests, Claude Agents, VSCode setups).

Language Behavior & Reasoning Traces

  • Some users see more Chinese in DeepSeek’s internal reasoning and sometimes in final answers; others never encounter this, especially via the API.
  • Explanations proposed in-thread:
    • Chinese tokens are more compact, so “thinking” in Chinese might be cheaper.
    • System prompts or training data may bias toward Chinese.
    • Context limit issues and heavy quantization can leak non-English text.
  • There is broad discussion of how chain-of-thought is represented: open models often expose true reasoning traces; proprietary systems may show summaries, and reasoning can diverge from final answers.

Voice, Speech, and Multimodality

  • Several commenters argue multimodal (vision + audio) is the future, but note DeepSeek still lacks built-in speech-to-text or text-to-speech in its app.
  • Debate over voice vs typing:
    • Pro-voice: faster for many people, better flow, crucial for accessibility and hands-busy tasks (driving, walking, cooking).
    • Skeptical: some dislike AI-mediated communication and worry about atrophying writing skills.

Economics, Competition, and Policy

  • DeepSeek is praised for extremely low pricing compared to US frontier models, making large-scale coding and image-analysis projects feasible.
  • Some speculate about subsidies or cheap electricity but provide no concrete evidence.
  • Discussion touches on global AI competition, with contrasting views on whether foreign models should be restricted vs. welcomed as healthy competition.
  • Political constraints are noted: Chinese models may censor topics like “Tank Man,” while Western models have their own guardrails; all ecosystems are seen as shaped by local norms and laws.