Language models are injective and hence invertible
What “invertible” refers to
- Many commenters initially misread the claim as “given the text output, you can recover the prompt.”
- Thread clarifies:
- The paper proves (for transformer LMs) that the mapping from discrete input tokens to certain continuous hidden representations is injective (“almost surely”).
- The model outputs a next‑token probability distribution (and intermediate activations); that mapping can be invertible.
- The mapping from prompts to sampled text is clearly non‑injective; collisions (“OK, got it”, “Yes”) occur constantly.
- The inversion algorithm (SipIt) reconstructs prompts from internal hidden states, not from chat‑style text responses.
Title, communication, and hype
- Several people find the title misleading / clickbaity because most practitioners equate “language model” with “text‑in, text‑out system,” not with “deterministic map to a distribution.”
- Others argue that within the research community the title is technically precise; the confusion stems from public misuse of terms like “model”.
- Some worry hype will reduce long‑term citations; others note that in a fast field, short‑term visibility is rewarded.
Collision tests and high‑dimensional geometry
- Skeptics question the empirical claim of “no collisions in billions of tests”:
- Hidden states live on a huge continuous sphere (e.g. 768‑D); the epsilon ball used for “collision” is extremely tiny.
- In such spaces, random vectors are overwhelmingly near‑orthogonal, so seeing no collisions in billions of samples is expected and weak evidence.
- Discussion touches on concentration of measure, birthday paradox limits, and the difference between “practically injective” and provably injective.
- Some note that even if collisions are astronomically rare, that doesn’t guarantee reliable inversion when information is truly lost (analogy to hashes).
Privacy, security, and embeddings
- Because hidden states (and embeddings) can in principle reconstruct prompts, storing or exposing them is not privacy‑preserving.
- This reinforces prior work showing “embeddings reveal almost as much as text” and undercuts the notion that vector DBs are inherently anonymizing.
- Suggested mitigations include random orthogonal rotations of embeddings or splitting sequences across machines (related obfuscation/defense work is cited).
- However, most production systems only expose final sampled text, so direct prompt recovery from network responses remains out of scope.
Conceptual implications for how LLMs work
- Result supports the view that transformers “project and store” input rather than discarding it; in‑context “learning” may just be manipulating a rich, largely lossless representation.
- Some see this as consistent with why models can repeat or condition on arbitrary “garbage” sequences: the residual stream must preserve them to perform tasks like copying.
- Debates arise over whether this counts as “abstraction” or merely compression/curve‑fitting; analogy made to compressing data once you understand an underlying rule.
Limitations, edge cases, and potential uses
- The result is about theoretical, deterministic models with fixed context windows and hidden activations; it does not enable recovering training data, per author clarifications mentioned.
- “Almost surely injective” leaves open rare collisions; how that translates into guarantees for inversion in adversarial or worst‑case settings is unclear.
- Possible applications discussed:
- Attacking prompt‑hiding schemes in hosted inference.
- Checking for AI‑generated text or recovering prompts—though in practice this would require the exact model, internal states, and unedited outputs, making it fragile.
- Awareness that any stored intermediate states may be legally/compliantly equivalent to storing the raw prompt.