Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@allen_ai: New updates for olmOCR, our fu...

@allen_ai
17 views Jun 20, 2025
1
New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:

1️⃣ New benchmark for fair comparison of OCR engines and APIs
2️⃣ Improved inference that is faster and cheaper to run
3️⃣ Docker image for easy deployment
Media image
2
Most OCR benchmarks compare model output to a fixed reference text. This approach can be misleading, since it penalizes correct outputs that differ in style only.

For olmOCR-Bench, we created 7000+ unit tests over 1400+ documents to test core extraction capabilities:
Media image
3
Our tests check if math equations are transcribed, tables contain the right values, boilerplate is removed, etc. Each one is designed to be simple, unambiguous, and machine-verifiable.

On these rigorous tests, olmOCR outperforms all other models we compared to:
Media image
4
olmOCR is also now easier to use than ever:

- Simpler installation
- Prebuilt Docker containers
- Upgraded to the latest vLLM version (support for quantization coming soon!)
- Better performance with improved sampling, tweaked retry strategy, and a cleaner prompt!
5
Run olmOCR-bench yourself: github.com/allenai/olmocr…

OCR your own documents: github.com/allenai/olmocr

Try the olmOCR online demo: olmocr.allenai.org

Read our updated technical report: olmocr.allenai.org/papers/olmocr.…
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial