Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: extend integration docs & README #456

Merged
merged 1 commit into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
</a>
</p>

# Docling
# 🦆 Docling

<p align="center">
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
Expand All @@ -29,7 +29,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* 📑 Advanced PDF document understanding including page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications
* 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI

Expand Down Expand Up @@ -65,8 +65,24 @@ result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
```

Check out [Getting started](https://ds4sd.github.io/docling/).
You will find lots of tuning options to leverage all the advanced capabilities.
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
the docs.

## Documentation

Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
installation, usage, concepts, recipes, extensions, and more.

## Examples

Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
demonstrating how to address different application use cases with Docling.

## Integrations

To further accelerate your AI application development, check out Docling's native
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
and tools.

## Get help and support

Expand Down
Binary file added docs/assets/docling_ecosystem.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/docling_ecosystem.pptx
Binary file not shown.
4 changes: 1 addition & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
# Docling

<p align="center">
<img loading="lazy" alt="Docling" src="assets/docling_processing.png" width="100%" />
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
Expand All @@ -23,7 +21,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications
* 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI

Expand Down
9 changes: 9 additions & 0 deletions docs/integrations/bee.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Docling is available as an extraction backend in the [Bee][github] framework.

- 💻 [Bee GitHub][github]
- 📖 [Bee Docs][docs]
- 📦 [Bee NPM][package]

[github]: https://github.com/i-am-bee
[docs]: https://i-am-bee.github.io/bee-agent-framework/
[package]: https://www.npmjs.com/package/bee-agent-framework
5 changes: 5 additions & 0 deletions docs/integrations/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
Use the navigation on the left to browse through Docling integrations with popular frameworks and tools.


<p align="center">
<img loading="lazy" alt="Docling" src="../assets/docling_ecosystem.png" width="100%" />
</p>
17 changes: 17 additions & 0 deletions docs/integrations/instructlab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Docling is powering document processing in [InstructLab](https://instructlab.ai/),
enabling users to unlock the knowledge hidden in documents and present it to
InstructLab's fine-tuning for aligning AI models to the user's specific data.

More details can be found in this [blog post][blog].

- 🏠 [InstructLab Home][home]
- 💻 [InstructLab GitHub][github]
- 🧑🏻‍💻 [InstructLab UI][ui]
- 📖 [InstructLab Docs][docs]
<!-- - 📝 [Blog post]() -->

[home]: https://instructlab.ai
[github]: https://github.com/instructlab
[ui]: https://ui.instructlab.ai/
[docs]: https://docs.instructlab.ai/
[blog]: https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
9 changes: 9 additions & 0 deletions docs/integrations/prodigy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Docling is available in [Prodigy][home] as a [Prodigy-PDF plugin][plugin] recipe.

- 🌐 [Prodigy Home][home]
- 🔌 [Prodigy-PDF Plugin][plugin]
- 🧑🏽‍🍳 [pdf-spans.manual Recipe][recipe]

[home]: https://prodi.gy/
[plugin]: https://prodi.gy/docs/plugins#pdf
[recipe]: https://prodi.gy/docs/plugins#pdf-spans.manual
2 changes: 2 additions & 0 deletions docs/integrations/spacy.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# spaCy

Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin:

- 💻 [SpacyLayout GitHub][github]
Expand Down
2 changes: 2 additions & 0 deletions docs/overrides/main.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
{% extends "base.html" %}

{#
{% block announce %}
<p>🎉 Docling has gone v2! <a href="{{ 'v2' | url }}">Check out</a> what's new and how to get started!</p>
{% endblock %}
#}
9 changes: 6 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ theme:
- search.suggest
- toc.follow
nav:
- Get started:
- Home: index.md
- Home:
- "🦆 Docling": index.md
- Installation: installation.md
- Usage: usage.md
- CLI: cli.md
Expand Down Expand Up @@ -85,10 +85,13 @@ nav:
# - CLI: examples/cli.md
- Integrations:
- Integrations: integrations/index.md
- "🐝 Bee": integrations/bee.md
- "Data Prep Kit": integrations/data_prep_kit.md
- "DocETL": integrations/docetl.md
- "🐶 InstructLab": integrations/instructlab.md
- "Kotaemon": integrations/kotaemon.md
- "LlamaIndex 🦙": integrations/llamaindex.md
- "🦙 LlamaIndex": integrations/llamaindex.md
- "Prodigy": integrations/prodigy.md
- "spaCy": integrations/spacy.md
# - "LangChain 🦜🔗": integrations/langchain.md
# - API reference:
Expand Down