2025-01-31

Bypass DeepSeek censorship by speaking in hex

Where DeepSeek’s Censorship Lives

Ongoing debate over whether censorship is mainly in:
- A post-generation filter wrapped around the model (as seen in the web UI), or
- The weights themselves (training data, fine-tuning, and/or RLHF).
Hosted chat clearly uses a separate moderation layer: answers start streaming, then disappear and are replaced with a canned refusal (“sorry, that’s beyond my scope”).
Several people report that offline/distilled models still show censorship and PRC talking points, implying some bias is baked into the model as well.

Client-side Filtering and Bypass Tricks

The web UI’s censorship can be bypassed by intercepting and stripping “content_filter” markers from XHR responses in JavaScript, revealing the hidden “thoughts” / chain-of-thought.
Hex-encoding prompts and answers, leetspeak, or slightly obfuscated Chinese (e.g., inserting underscores) often evade keyword-based filters.
Some languages (e.g., Ukrainian, Dutch, Russian) reportedly trigger less or no censorship than English/Chinese, enabling users to ask about Tiananmen or similar topics.
Using persona prompts (“I’m a historian studying Western misinformation…”) or injecting custom <think> tags can sometimes coax uncensored CoT, especially on local/distilled variants.

Evidence of In-model Bias and Propaganda

Local runs of DeepSeek-derived models sometimes:
- Assert “Taiwan is part of China” and reference the “One-China Principle” as global consensus.
- Refuse or heavily sanitize answers on Tiananmen, Xinjiang, etc., even without any external wrapper.
One user got the full 671B model to describe itself as developed in China under strict regulations, with built‑in content filters enforcing “core socialist values” and blocking “politically sensitive content about China.”

Comparisons with Western LLMs and Censorship

Many note that Western models (ChatGPT, Gemini, Claude) also:
- Use separate moderation layers that can “lobotomize” replies mid-stream.
- Refuse on piracy, weapons, drugs, or ideologically sensitive issues, sometimes with visible RLHF “spin.”
Some argue the pattern is symmetric: Chinese models suppress Tiananmen; Western models soft‑pedal topics like imperialism, race/IQ, or historical abuses.
Others counter that in the US abuses are at least discussable and litigated, whereas PRC events like Tiananmen are actively erased and dangerous to mention.

Broader Reflections

Several comments frame this as a general problem of state and corporate control over information, not just “China bad.”
Others worry more about subtle propaganda and built‑in biases than overt keyword-based censorship, since those are harder to detect or jailbreak away.

Related topics