2025-10-30

Language models are injective and hence invertible

What “invertible” refers to

Many commenters initially misread the claim as “given the text output, you can recover the prompt.”
Thread clarifies:
- The paper proves (for transformer LMs) that the mapping from discrete input tokens to certain continuous hidden representations is injective (“almost surely”).
- The model outputs a next‑token probability distribution (and intermediate activations); that mapping can be invertible.
- The mapping from prompts to sampled text is clearly non‑injective; collisions (“OK, got it”, “Yes”) occur constantly.
The inversion algorithm (SipIt) reconstructs prompts from internal hidden states, not from chat‑style text responses.

Title, communication, and hype

Several people find the title misleading / clickbaity because most practitioners equate “language model” with “text‑in, text‑out system,” not with “deterministic map to a distribution.”
Others argue that within the research community the title is technically precise; the confusion stems from public misuse of terms like “model”.
Some worry hype will reduce long‑term citations; others note that in a fast field, short‑term visibility is rewarded.

Collision tests and high‑dimensional geometry

Skeptics question the empirical claim of “no collisions in billions of tests”:
- Hidden states live on a huge continuous sphere (e.g. 768‑D); the epsilon ball used for “collision” is extremely tiny.
- In such spaces, random vectors are overwhelmingly near‑orthogonal, so seeing no collisions in billions of samples is expected and weak evidence.
Discussion touches on concentration of measure, birthday paradox limits, and the difference between “practically injective” and provably injective.
Some note that even if collisions are astronomically rare, that doesn’t guarantee reliable inversion when information is truly lost (analogy to hashes).

Privacy, security, and embeddings

Because hidden states (and embeddings) can in principle reconstruct prompts, storing or exposing them is not privacy‑preserving.
This reinforces prior work showing “embeddings reveal almost as much as text” and undercuts the notion that vector DBs are inherently anonymizing.
Suggested mitigations include random orthogonal rotations of embeddings or splitting sequences across machines (related obfuscation/defense work is cited).
However, most production systems only expose final sampled text, so direct prompt recovery from network responses remains out of scope.

Conceptual implications for how LLMs work

Result supports the view that transformers “project and store” input rather than discarding it; in‑context “learning” may just be manipulating a rich, largely lossless representation.
Some see this as consistent with why models can repeat or condition on arbitrary “garbage” sequences: the residual stream must preserve them to perform tasks like copying.
Debates arise over whether this counts as “abstraction” or merely compression/curve‑fitting; analogy made to compressing data once you understand an underlying rule.

Limitations, edge cases, and potential uses

The result is about theoretical, deterministic models with fixed context windows and hidden activations; it does not enable recovering training data, per author clarifications mentioned.
“Almost surely injective” leaves open rare collisions; how that translates into guarantees for inversion in adversarial or worst‑case settings is unclear.
Possible applications discussed:
- Attacking prompt‑hiding schemes in hosted inference.
- Checking for AI‑generated text or recovering prompts—though in practice this would require the exact model, internal states, and unedited outputs, making it fragile.
- Awareness that any stored intermediate states may be legally/compliantly equivalent to storing the raw prompt.

Related topics