✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Ai2
@allen_ai
New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:

1️⃣ New benchmark for fair comparison of OCR engines and APIs
2️⃣ Improved inference that is faster and cheaper to run
3️⃣ Docker image for easy deployment
Thread image
Ai2
@allen_ai
Most OCR benchmarks compare model output to a fixed reference text. This approach can be misleading, since it penalizes correct outputs that differ in style only.

For olmOCR-Bench, we created 7000+ unit tests over 1400+ documents to test core extraction capabilities:
Thread image
Ai2
@allen_ai
Our tests check if math equations are transcribed, tables contain the right values, boilerplate is removed, etc. Each one is designed to be simple, unambiguous, and machine-verifiable.

On these rigorous tests, olmOCR outperforms all other models we compared to:
Thread image
Ai2
@allen_ai
olmOCR is also now easier to use than ever:

- Simpler installation
- Prebuilt Docker containers
- Upgraded to the latest vLLM version (support for quantization coming soon!)
- Better performance with improved sampling, tweaked retry strategy, and a cleaner prompt!
Ai2
@allen_ai
Run olmOCR-bench yourself: github.com/allenai/olmocr…

OCR your own documents: github.com/allenai/olmocr

Try the olmOCR online demo: olmocr.allenai.org

Read our updated technical report: olmocr.allenai.org/papers/olmocr.…
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export