Visualize Thread by @allen_ai

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Ai2

@allen_ai

New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:

1️⃣ New benchmark for fair comparison of OCR engines and APIs
2️⃣ Improved inference that is faster and cheaper to run
3️⃣ Docker image for easy deployment

Ai2

@allen_ai

Most OCR benchmarks compare model output to a fixed reference text. This approach can be misleading, since it penalizes correct outputs that differ in style only.

For olmOCR-Bench, we created 7000+ unit tests over 1400+ documents to test core extraction capabilities:

Ai2

@allen_ai

Our tests check if math equations are transcribed, tables contain the right values, boilerplate is removed, etc. Each one is designed to be simple, unambiguous, and machine-verifiable.

On these rigorous tests, olmOCR outperforms all other models we compared to:

Ai2

@allen_ai

olmOCR is also now easier to use than ever:

- Simpler installation
- Prebuilt Docker containers
- Upgraded to the latest vLLM version (support for quantization coming soon!)
- Better performance with improved sampling, tweaked retry strategy, and a cleaner prompt!

Ai2

@allen_ai

Run olmOCR-bench yourself: github.com/allenai/olmocr…

OCR your own documents: github.com/allenai/olmocr

Try the olmOCR online demo: olmocr.allenai.org

Read our updated technical report: olmocr.allenai.org/papers/olmocr.…

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export