holy sh*t... your llm remembers everything you typed 🤯
researchers just proved you can recover the EXACT input text from a language model's hidden states.
not similar text. not approximate.
the actual words you typed.
here's what they found:
• transformer language models are mathematically injective
• different inputs = different hidden states (with probability 1)
• this isn't a coincidence or training artifact, it's structural
• they built SIPIT, an algorithm that inverts the model in linear time
• tested on billions of prompts across GPT-2, Gemma, Llama
• 100% exact recovery rate. zero collisions found.
the math is airtight.
transformers are real-analytic functions, which means collisions can only happen on measure-zero parameter sets.
at random init?
probability zero.
after gradient descent?
still zero.
you cannot accidentally make these models lossy.
the information doesn't compress. it doesn't abstract.
it just transforms into a different representation that perfectly preserves every token.
your prompt never leaves the model. it just lives in 768 dimensions instead of text.
this changes everything about how we think about llm internals, interpretability, and what "representations" actually mean.

the privacy implications are absolutely brutal.
if you store embeddings, you're storing the original text. period.
any system that keeps hidden states can reconstruct your input word for word.
doesn't matter if they "deleted the prompt" or claim the data is "anonymized."
the Hamburg Data Protection Commissioner argued that model weights don't contain personal data because you can't trivially extract training examples. cool.
but at inference time? your input is sitting right there in the activations, perfectly recoverable.
every api returning embeddings is effectively leaking your raw prompt.
every vector database is a text database in disguise.
every "we only keep representations for safety monitoring" is storing your actual words.
there's no free privacy once your data enters a transformer. the architecture itself prevents information loss.
if you store embeddings, you're storing the original text. period.
any system that keeps hidden states can reconstruct your input word for word.
doesn't matter if they "deleted the prompt" or claim the data is "anonymized."
the Hamburg Data Protection Commissioner argued that model weights don't contain personal data because you can't trivially extract training examples. cool.
but at inference time? your input is sitting right there in the activations, perfectly recoverable.
every api returning embeddings is effectively leaking your raw prompt.
every vector database is a text database in disguise.
every "we only keep representations for safety monitoring" is storing your actual words.
there's no free privacy once your data enters a transformer. the architecture itself prevents information loss.

why does this happen?
causal attention means each token only sees the prefix.
real-analytic activations (tanh, GELU) preserve structure. gradient descent never moves parameters into the measure-zero collision set.
the result: injective maps from prompts to hidden states, preserved throughout training.
they even show the margin between different inputs grows with depth. deeper layers = more separation = easier inversion.
causal attention means each token only sees the prefix.
real-analytic activations (tanh, GELU) preserve structure. gradient descent never moves parameters into the measure-zero collision set.
the result: injective maps from prompts to hidden states, preserved throughout training.
they even show the margin between different inputs grows with depth. deeper layers = more separation = easier inversion.
View Tweet
this isn't just theory.
SIPIT reconstructs prompts 100x faster than alternatives with perfect accuracy.
works on any layer. scales linearly with sequence length.
implications:
→ mechanistic interpretability gets a clean baseline
→ hidden states aren't abstractions, they're the input
→ any failure to probe them is your fault, not missing info
→ compliance frameworks need updating immediately
paper: arxiv.org/abs/2510.15511
transformers are structurally lossless. your text never compresses.
it just shape-shifts.
SIPIT reconstructs prompts 100x faster than alternatives with perfect accuracy.
works on any layer. scales linearly with sequence length.
implications:
→ mechanistic interpretability gets a clean baseline
→ hidden states aren't abstractions, they're the input
→ any failure to probe them is your fault, not missing info
→ compliance frameworks need updating immediately
paper: arxiv.org/abs/2510.15511
transformers are structurally lossless. your text never compresses.
it just shape-shifts.
TL;DR for normal people:
when you type something into chatgpt,
the model converts your words into numbers (called "hidden states" or "embeddings").
everyone assumed this conversion loses information, like how a jpeg compresses a photo.
wrong.
researchers proved the conversion is perfectly reversible.
they built an algorithm that takes those numbers and reconstructs your EXACT original text with 100% accuracy.
think of it like this: you thought the model was taking notes. it's actually recording everything.
what this means:
→ if a company saves your "embeddings," they have your actual words
→ deleting your chat but keeping the embeddings changes nothing
→ any claim that "we only store representations" is meaningless
→ vector databases, api logs, safety monitoring...
all contain your original text
the paper proves this isn't fixable.
it's not a bug or oversight. the math of how transformers work REQUIRES them to preserve your input perfectly.
you literally cannot build a transformer that forgets.
bottom line: there's no such thing as "just the embeddings."
there's only your text in a different format.
when you type something into chatgpt,
the model converts your words into numbers (called "hidden states" or "embeddings").
everyone assumed this conversion loses information, like how a jpeg compresses a photo.
wrong.
researchers proved the conversion is perfectly reversible.
they built an algorithm that takes those numbers and reconstructs your EXACT original text with 100% accuracy.
think of it like this: you thought the model was taking notes. it's actually recording everything.
what this means:
→ if a company saves your "embeddings," they have your actual words
→ deleting your chat but keeping the embeddings changes nothing
→ any claim that "we only store representations" is meaningless
→ vector databases, api logs, safety monitoring...
all contain your original text
the paper proves this isn't fixable.
it's not a bug or oversight. the math of how transformers work REQUIRES them to preserve your input perfectly.
you literally cannot build a transformer that forgets.
bottom line: there's no such thing as "just the embeddings."
there's only your text in a different format.
The AI prompt library your competitors don't want you to find
→ Biggest collection of text & image prompts
→ Unlimited custom prompts
→ Lifetime access & updates
Grab it before it's gone 👇
godofprompt.ai/pricing
→ Biggest collection of text & image prompts
→ Unlimited custom prompts
→ Lifetime access & updates
Grab it before it's gone 👇
godofprompt.ai/pricing
Generated by Thread Navigator
Press ⌘ + S to quick-export
