2026-06-18

DeepSeek Introduces Vision

Availability & API Support

Vision is now visible in the DeepSeek chat UI for many users, but there is no official announcement or documentation page yet.
Multiple commenters confirm it is not available via the API at this time; several say the lack of API vision is the main blocker for integrating DeepSeek into projects or spending more on it.
Some users report having the “vision” tab for months, but others clarify that older functionality was just OCR piped into a text-only model, whereas now the model natively accepts images.

Capabilities & Quality

Users testing with varied photos report that DeepSeek Vision is fast and generally accurate at understanding scenes, not just reading text.
It currently only analyzes images; no image editing/generation is mentioned.
Several people want it paired with other tools (e.g., Apple Vision frameworks, Playwright tests, Claude Agents, VSCode setups).

Language Behavior & Reasoning Traces

Some users see more Chinese in DeepSeek’s internal reasoning and sometimes in final answers; others never encounter this, especially via the API.
Explanations proposed in-thread:
- Chinese tokens are more compact, so “thinking” in Chinese might be cheaper.
- System prompts or training data may bias toward Chinese.
- Context limit issues and heavy quantization can leak non-English text.
There is broad discussion of how chain-of-thought is represented: open models often expose true reasoning traces; proprietary systems may show summaries, and reasoning can diverge from final answers.

Voice, Speech, and Multimodality

Several commenters argue multimodal (vision + audio) is the future, but note DeepSeek still lacks built-in speech-to-text or text-to-speech in its app.
Debate over voice vs typing:
- Pro-voice: faster for many people, better flow, crucial for accessibility and hands-busy tasks (driving, walking, cooking).
- Skeptical: some dislike AI-mediated communication and worry about atrophying writing skills.

Economics, Competition, and Policy

DeepSeek is praised for extremely low pricing compared to US frontier models, making large-scale coding and image-analysis projects feasible.
Some speculate about subsidies or cheap electricity but provide no concrete evidence.
Discussion touches on global AI competition, with contrasting views on whether foreign models should be restricted vs. welcomed as healthy competition.
Political constraints are noted: Chinese models may censor topics like “Tank Man,” while Western models have their own guardrails; all ecosystems are seen as shaped by local norms and laws.

Related topics