π¨ DeepSeek just did something wild.
They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.
Their model, DeepSeek-OCR, achieves 97% decoding precision at 10Γ compression and still manages 60% accuracy even at 20Γ. That means one image can represent entire documents using a fraction of the tokens an LLM would need.
Even crazier? It beats GOT-OCR2.0 and MinerU2.0 while using up to 60Γ fewer tokens and can process 200K+ pages/day on a single A100.
This could solve one of AIβs biggest problems: long-context inefficiency.
Instead of paying more for longer sequences, models might soon see text instead of reading it.
The future of context compression might not be textual at all.
It might be optical ποΈ
github. com/deepseek-ai/DeepSeek-OCR

1. Vision-Text Compression: The Core Idea
LLMs struggle with long documents because token usage scales quadratically with length.
DeepSeek-OCR flips that: instead of reading text, it encodes full documents as vision tokens each token representing a compressed piece of visual information.
Result: You can fit 10 pages worth of text into the same token budget it takes to process 1 page in GPT-4.
LLMs struggle with long documents because token usage scales quadratically with length.
DeepSeek-OCR flips that: instead of reading text, it encodes full documents as vision tokens each token representing a compressed piece of visual information.
Result: You can fit 10 pages worth of text into the same token budget it takes to process 1 page in GPT-4.

2. DeepEncoder - The Optical Compressor
Meet the star: DeepEncoder.
It uses two backbones SAM (for perception) and CLIP (for global vision) bridged by a 16Γ convolutional compressor.
This allows it to maintain high-res understanding without exploding activation memory.
The encoder converts thousands of image patches β a few hundred compact vision tokens.
Meet the star: DeepEncoder.
It uses two backbones SAM (for perception) and CLIP (for global vision) bridged by a 16Γ convolutional compressor.
This allows it to maintain high-res understanding without exploding activation memory.
The encoder converts thousands of image patches β a few hundred compact vision tokens.

3. Multi-Resolution βGundamβ Mode
Documents vary invoices β blueprints β newspapers.
To handle this, DeepSeek-OCR supports multiple resolution modes: Tiny, Small, Base, Large, and Gundam.
Gundam mode combines local tiles + a global view scaling from 512Γ512 to 1280Γ1280 efficiently.
One model, multiple resolutions, no retraining.
Documents vary invoices β blueprints β newspapers.
To handle this, DeepSeek-OCR supports multiple resolution modes: Tiny, Small, Base, Large, and Gundam.
Gundam mode combines local tiles + a global view scaling from 512Γ512 to 1280Γ1280 efficiently.
One model, multiple resolutions, no retraining.

4. Data Engine OCR 1.0 to 2.0
They didnβt just train on text scans.
DeepSeek-OCRβs data includes:
β’ 30M+ PDF pages across 100 languages
β’ 10M natural scene OCR samples
β’ 10M charts + 5M chemical formulas + 1M geometry problems
Itβs not just reading itβs parsing scientific diagrams, equations, and layouts.
They didnβt just train on text scans.
DeepSeek-OCRβs data includes:
β’ 30M+ PDF pages across 100 languages
β’ 10M natural scene OCR samples
β’ 10M charts + 5M chemical formulas + 1M geometry problems
Itβs not just reading itβs parsing scientific diagrams, equations, and layouts.

5.
This isnβt βjust another OCR.β
Itβs a proof of concept for context compression.
If text can be represented visually with 10Γ fewer tokens LLMs could use the same idea for long-term memory and efficient reasoning.
Imagine GPT-5 processing a 1M-token document as a 100K-token image map.
This isnβt βjust another OCR.β
Itβs a proof of concept for context compression.
If text can be represented visually with 10Γ fewer tokens LLMs could use the same idea for long-term memory and efficient reasoning.
Imagine GPT-5 processing a 1M-token document as a 100K-token image map.

Stop wasting hours writing prompts
β 10,000+ ready-to-use prompts
β Create your own in seconds
β Lifetime access. One-time payment.
Claim your copy π
godofprompt.ai/pricing
β 10,000+ ready-to-use prompts
β Create your own in seconds
β Lifetime access. One-time payment.
Claim your copy π
godofprompt.ai/pricing
Generated by Thread Navigator
Press β + S to quick-export
auto_awesome
Image exported!
Pro export renders embedded tweets & media at 2x Retina resolution.
Upgrade β $5 for 30 days