2025-01-27

DeepSeek releases Janus Pro, a text-to-image generator [pdf]

Market impact, Nvidia, and AI economics

Many see DeepSeek’s efficiency (e.g., Janus trained on relatively modest A100 clusters; R1 reportedly in the low‑millions) as undermining narratives that justify massive AI capex and endless GPU demand.
Bear case: if similar or better results need far fewer GPUs, prior NVDA valuations based on “billions in infra” and permanent chip scarcity were over-optimistic; efficiency effectively multiplies existing premium GPU supply and pressures prices.
Bull case: Jevons paradox – cheaper “intelligence” increases total consumption of AI; Nvidia remains supply‑constrained and a de‑facto GPU monopoly, so demand will catch up and keep prices high.
Some argue the stock reaction reflects herd psychology and misunderstanding (“NVDA = AI”), not fundamentals; others think markets are correctly updating that AI demand is finite in current form.

Janus Pro capabilities and limitations

Janus Pro is a 7B “unified” LLM/VLM (not diffusion) that benchmarks well on text–image tasks and multimodal understanding, with a permissive but non‑FOSS license (no military and some content restrictions).
Major caveat: native output is only 384×384, below older SD 1.5 (512×512); upscaling is possible but quality may lag SDXL / Flux / Imagen in practice.
Several users testing older Janus variants (e.g., 1.3B Flow) found quality around SD 1.x and weaker than DALL‑E 3/Flux; the paper’s gains seem more about prompt understanding and captioning than raw image fidelity.
Current demos may not yet expose the full Pro model; fine‑grained editing features common in flagship multimodal LLMs are not obvious here.

Licensing, openness, and strategic intent

DeepSeek’s licenses allow commercial use but restrict certain domains, so models are “open weights” rather than fully open source; still looser than Llama’s competitor restrictions.
Weights and training pipelines are described; pretraining data remains largely opaque, which commenters see as the real “secret sauce.”
Widespread view: DeepSeek’s parent (a quant hedge fund) may prioritize eroding Western AI moats and commoditizing models over building a closed OpenAI competitor, leveraging and reinforcing the open‑source ecosystem.
Some see this as a deliberate geopolitical move: show that US chip/export controls can be routed around via algorithmic efficiency, and push global users toward inexpensive, China‑origin models.

Censorship, bias, and trust

Multiple examples show DeepSeek models refusing or deflecting on topics like Tiananmen Square, Taiwan, Xinjiang, or criticism of CCP leaders, sometimes even in local/offline distills.
There’s evidence of layered censorship: safety filters at API level plus some behaviors in the base weights, likely inherited from RLHF data (often from already‑aligned Western models like ChatGPT/Qwen).
Many point out Western models are also heavily censored (e.g., on certain political, religious, copyright, or “DEI” topics); the main difference is which taboos are enforced.
Open‑weights are valued because users can self‑host, finetune, or “uncensor,” but concern remains that subtle propaganda baked into weights would be hard to detect or remove.

Reasoning, agents, and limits of current GenAI

One long thread argues current stochastic sequence models lack mechanisms for strict deductive constraints, causal reasoning, and genuine goal‑directed agency; they can’t reliably be “employees” acting on business goals.
Counterarguments:
- Practical systems already combine multiple models and validators, forcing redos until outputs pass checks; this is expensive but often cheaper than human labor.
- Much white‑collar work is still repetitive data shuffling that is highly automatable; AI will likely augment one human to replace several, rather than fully replace “intentional” roles soon.
Broad agreement that today’s best use cases are extraction, synthesis, coding help, and narrow automations; fully reliable autonomous agents remain unsolved.

Chinese leadership, geopolitics, and Western reaction

Many commenters highlight a pattern: Western threads fixate on Tiananmen/Winnie‑the‑Pooh and CCP control instead of engaging with the technical achievement (small‑scale training, strong reasoning, open weights).
Others argue distrust is justified when a strategic technology is tightly coupled to an authoritarian state, especially if its worldview is subtly aligned with that state’s narratives.
There’s a sense that US tech and investors over‑indexed on scaling brute compute (“Stargate”), assuming a long‑lived moat, and have been blindsided by a lean, open, Chinese effort that stands on top of global open research and tooling.

Related topics