DeepSeek Introduces Vision
Availability & API Support
- Vision is now visible in the DeepSeek chat UI for many users, but there is no official announcement or documentation page yet.
- Multiple commenters confirm it is not available via the API at this time; several say the lack of API vision is the main blocker for integrating DeepSeek into projects or spending more on it.
- Some users report having the “vision” tab for months, but others clarify that older functionality was just OCR piped into a text-only model, whereas now the model natively accepts images.
Capabilities & Quality
- Users testing with varied photos report that DeepSeek Vision is fast and generally accurate at understanding scenes, not just reading text.
- It currently only analyzes images; no image editing/generation is mentioned.
- Several people want it paired with other tools (e.g., Apple Vision frameworks, Playwright tests, Claude Agents, VSCode setups).
Language Behavior & Reasoning Traces
- Some users see more Chinese in DeepSeek’s internal reasoning and sometimes in final answers; others never encounter this, especially via the API.
- Explanations proposed in-thread:
- Chinese tokens are more compact, so “thinking” in Chinese might be cheaper.
- System prompts or training data may bias toward Chinese.
- Context limit issues and heavy quantization can leak non-English text.
- There is broad discussion of how chain-of-thought is represented: open models often expose true reasoning traces; proprietary systems may show summaries, and reasoning can diverge from final answers.
Voice, Speech, and Multimodality
- Several commenters argue multimodal (vision + audio) is the future, but note DeepSeek still lacks built-in speech-to-text or text-to-speech in its app.
- Debate over voice vs typing:
- Pro-voice: faster for many people, better flow, crucial for accessibility and hands-busy tasks (driving, walking, cooking).
- Skeptical: some dislike AI-mediated communication and worry about atrophying writing skills.
Economics, Competition, and Policy
- DeepSeek is praised for extremely low pricing compared to US frontier models, making large-scale coding and image-analysis projects feasible.
- Some speculate about subsidies or cheap electricity but provide no concrete evidence.
- Discussion touches on global AI competition, with contrasting views on whether foreign models should be restricted vs. welcomed as healthy competition.
- Political constraints are noted: Chinese models may censor topics like “Tank Man,” while Western models have their own guardrails; all ecosystems are seen as shaped by local norms and laws.