Writing picture enrichment annotations to Markdown file #625

theobgbd · 2024-12-18T16:00:23Z

Image annotations to MD

Following the Figure Enrichment tutorial it is easy to add classification metadata to an image through the element.annotations.append(data) function.

However this data is not stored during the export to Markdown format. Would there be a way to write it alongside the image path in the final MD document ? Would be great for our RAG application.

The text was updated successfully, but these errors were encountered:

gauravmindzk · 2025-01-08T10:03:09Z

hi @theobgbd ,
I'm currently exploring PDF parsers for my PDF RAG app.

What is the meaning of Picture Enrichment in Docling ?

I'm searching for a way to add context to images of my pdfs which will eventually help in image summarization.

Is picture enrichment the answer ?

dolfim-ibm · 2025-01-08T11:27:30Z

Picture enrichment currently has specific typing for classification, description (like model captioning), charts, chemical strucrures, and a generic one PictureMiscData https://github.com/DS4SD/docling-core/blob/main/docling_core/types/doc/document.py#L261.

We have some idea on how to enable the serialization of that data when exporting to markdown or other formats.

theobgbd · 2025-01-08T15:13:27Z

I'm searching for a way to add context to images of my pdfs which will eventually help in image summarization.

Is picture enrichment the answer ?

@gauravmindzk What I meant by "picture enrichment" was model captionning with a vision LLM, with added context from the document. For my use case, I used [this template] https://ds4sd.github.io/docling/examples/develop_picture_enrichment/ ) as a base to set up a call to an on-premise Pixtral instance to caption the images, and then store the answer in the annotations field.

Picture enrichment currently has specific typing for classification, description (like model captioning), charts, chemical strucrures, and a generic one PictureMiscData https://github.com/DS4SD/docling-core/blob/main/docling_core/types/doc/document.py#L261.

We have some idea on how to enable the serialization of that data when exporting to markdown or other formats.

@dolfim-ibm Thanks for your feedback, I didn't know about the PictureMiscData field. I forked the docling_core project and wrote a crude "ANNOTATION" markdown export format that writes the content of the annotations as the "alt" description for the image in the MD file. That solution fits my need for now, but its nice to know that you have some plans to serialize it in the future.

theobgbd added the question Further information is requested label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing picture enrichment annotations to Markdown file #625

Writing picture enrichment annotations to Markdown file #625

theobgbd commented Dec 18, 2024

gauravmindzk commented Jan 8, 2025

dolfim-ibm commented Jan 8, 2025

theobgbd commented Jan 8, 2025

Writing picture enrichment annotations to Markdown file #625

Writing picture enrichment annotations to Markdown file #625

Comments

theobgbd commented Dec 18, 2024

Image annotations to MD

gauravmindzk commented Jan 8, 2025

dolfim-ibm commented Jan 8, 2025

theobgbd commented Jan 8, 2025