Language models are injective and hence invertible

What “invertible” refers to

  • Many commenters initially misread the claim as “given the text output, you can recover the prompt.”
  • Thread clarifies:
    • The paper proves (for transformer LMs) that the mapping from discrete input tokens to certain continuous hidden representations is injective (“almost surely”).
    • The model outputs a next‑token probability distribution (and intermediate activations); that mapping can be invertible.
    • The mapping from prompts to sampled text is clearly non‑injective; collisions (“OK, got it”, “Yes”) occur constantly.
  • The inversion algorithm (SipIt) reconstructs prompts from internal hidden states, not from chat‑style text responses.

Title, communication, and hype

  • Several people find the title misleading / clickbaity because most practitioners equate “language model” with “text‑in, text‑out system,” not with “deterministic map to a distribution.”
  • Others argue that within the research community the title is technically precise; the confusion stems from public misuse of terms like “model”.
  • Some worry hype will reduce long‑term citations; others note that in a fast field, short‑term visibility is rewarded.

Collision tests and high‑dimensional geometry

  • Skeptics question the empirical claim of “no collisions in billions of tests”:
    • Hidden states live on a huge continuous sphere (e.g. 768‑D); the epsilon ball used for “collision” is extremely tiny.
    • In such spaces, random vectors are overwhelmingly near‑orthogonal, so seeing no collisions in billions of samples is expected and weak evidence.
  • Discussion touches on concentration of measure, birthday paradox limits, and the difference between “practically injective” and provably injective.
  • Some note that even if collisions are astronomically rare, that doesn’t guarantee reliable inversion when information is truly lost (analogy to hashes).

Privacy, security, and embeddings

  • Because hidden states (and embeddings) can in principle reconstruct prompts, storing or exposing them is not privacy‑preserving.
  • This reinforces prior work showing “embeddings reveal almost as much as text” and undercuts the notion that vector DBs are inherently anonymizing.
  • Suggested mitigations include random orthogonal rotations of embeddings or splitting sequences across machines (related obfuscation/defense work is cited).
  • However, most production systems only expose final sampled text, so direct prompt recovery from network responses remains out of scope.

Conceptual implications for how LLMs work

  • Result supports the view that transformers “project and store” input rather than discarding it; in‑context “learning” may just be manipulating a rich, largely lossless representation.
  • Some see this as consistent with why models can repeat or condition on arbitrary “garbage” sequences: the residual stream must preserve them to perform tasks like copying.
  • Debates arise over whether this counts as “abstraction” or merely compression/curve‑fitting; analogy made to compressing data once you understand an underlying rule.

Limitations, edge cases, and potential uses

  • The result is about theoretical, deterministic models with fixed context windows and hidden activations; it does not enable recovering training data, per author clarifications mentioned.
  • “Almost surely injective” leaves open rare collisions; how that translates into guarantees for inversion in adversarial or worst‑case settings is unclear.
  • Possible applications discussed:
    • Attacking prompt‑hiding schemes in hosted inference.
    • Checking for AI‑generated text or recovering prompts—though in practice this would require the exact model, internal states, and unedited outputs, making it fragile.
    • Awareness that any stored intermediate states may be legally/compliantly equivalent to storing the raw prompt.