โœจ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135ยฐ

style Card Style

40px
16px

text_fields Typography

16px
Matt Dancho (Business Science)
@mdancho84
๐Ÿšจ BREAKING: IBM launches a free Python library that converts ANY document to data

Introducing Docling. Here's what you need to know: ๐Ÿงต
Thread image
Matt Dancho (Business Science)
@mdancho84
1. What is Docling?

Docling is a Python library that simplifies document processing, parsing diverse formats โ€” including advanced PDF understanding โ€” and providing seamless integrations with the gen AI ecosystem.
Thread image
Matt Dancho (Business Science)
@mdancho84
2. Document Conversion Architecture

For each document format, the document converter knows which format-specific backend to employ for parsing the document and which pipeline to use for orchestrating the execution, along with any relevant options.
Thread image
Matt Dancho (Business Science)
@mdancho84
3. PDF Conversion to Markdown

Here is an example of the DocLayNet paper from arXiv, converted into Markdown format by Docling.
Thread image
Matt Dancho (Business Science)
@mdancho84
4. Core Technology:

Docling includes:

- PDF Backends for parsing
- Layout Analysis Model
- Vision-Based Table Formatter
- OCR for Text
Thread image
Matt Dancho (Business Science)
@mdancho84
๐Ÿšจ Want to learn how to build + ship AI and Data Science projects (that businesses actually want)?

On April 15th, I am hosting a free workshop to help you get started with AI + DS projects in Python.

Register here (500 seats): learn.business-science.io/ai-register ๏ฟผ
Thread image
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press โŒ˜ + S to quick-export