@omarsar0: How much do LLMs memorize?Me...
@omarsar0
11 views
Jun 08, 2025
3
Two-part decomposition of memorization
The paper formalizes unintended memorization (data-specific info) vs. generalization (distribution-level knowledge), and introduces a Kolmogorov-inspired compression-based metric to distinguish them.
The paper formalizes unintended memorization (data-specific info) vs. generalization (distribution-level knowledge), and introduces a Kolmogorov-inspired compression-based metric to distinguish them.
4
GPT capacity = 3.6 bits/parameter
Through training on synthetic datasets with known entropy, the authors determine that GPT-style models trained in bfloat16 consistently memorize 3.5–3.6 bits per parameter, which increases slightly with fp32 precision.
Their results show that GPT-family models can store about 3.6 bits-per-parameter, and propose scaling laws to predict memorization and membership inference.
Through training on synthetic datasets with known entropy, the authors determine that GPT-style models trained in bfloat16 consistently memorize 3.5–3.6 bits per parameter, which increases slightly with fp32 precision.
Their results show that GPT-family models can store about 3.6 bits-per-parameter, and propose scaling laws to predict memorization and membership inference.
6
Scaling laws for membership inference
The authors empirically derive a sigmoidal scaling law for membership inference performance as a function of capacity-to-data ratio.
As the dataset size grows, inference accuracy drops to chance (F1 ~ 0.5), explaining why large models trained on huge corpora are resilient to this attack.
The authors empirically derive a sigmoidal scaling law for membership inference performance as a function of capacity-to-data ratio.
As the dataset size grows, inference accuracy drops to chance (F1 ~ 0.5), explaining why large models trained on huge corpora are resilient to this attack.
8
Membership inference is easier than extraction
Even when extraction fails, models can still exhibit detectable membership bias via loss differences, especially when trained on small datasets.
The paper also shows that extraction success converges to test-level generalization as datasets grow large.
Overall, the authors measured the capacity of modern transformer language models and analyzed how measurements such as extraction and F1 score scale with model and dataset size.
Paper: arxiv.org/abs/2505.24832
Even when extraction fails, models can still exhibit detectable membership bias via loss differences, especially when trained on small datasets.
The paper also shows that extraction success converges to test-level generalization as datasets grow large.
Overall, the authors measured the capacity of modern transformer language models and analyzed how measurements such as extraction and F1 score scale with model and dataset size.
Paper: arxiv.org/abs/2505.24832






