Replies: 5 comments
-
Looks like this is currently not possible, see code: Altough we already have (https://github.com/microsoft/kernel-memory/blob/main/service/Abstractions/DataFormats/IOcrEngine.cs) in place, which would be enough for simple text extraction, and UglyToad.PdfPig is able to extract images as experimental feature. @dluc Wouldn't it be possible to extend "FileContent" with a Array of found Images in the PDF described GPT-4 Vision Api if enabled? |
Beta Was this translation helpful? Give feedback.
-
I think that you can support this scenario when the issue #379 will be completed (currently there is a PR in preview). With that, you will be able to inject a custom decoder for PDF files. |
Beta Was this translation helpful? Give feedback.
-
Given that now custom content decoders can be injected, I would first try creating one that replaces the default PDF decoder, and internally does all the work of extracting text and text from images. E.g. you can create a decoder that depends on the existing image decoder to parse images, and return all the text at the end, without the need to revisit the FileContent class (for now). |
Beta Was this translation helpful? Give feedback.
-
Is any work being done on this? My company desperately needs this functionality, and my quick solution would be to simply extract the images from the PDF first, then send PDF + images to KernelMemory. But this sounds exactly like the solution @dluc is proposing above (only, outside of KM). I'd much rather help contribute to KernelMemory than create my own one-off solution. |
Beta Was this translation helpful? Give feedback.
-
Any update on OCR extraction for PDFs? Customer has a bunch of pdf docs generated from a scanning solution. |
Beta Was this translation helpful? Give feedback.
-
Context / Scenario
I referred to this example and wrote an implementation of OCR. Attempting to scan PDF and PDF containing images did not trigger it. I'm not sure if there was anything wrong with the operation
Question
I referred to this example and wrote an implementation of OCR. Attempting to scan PDF and PDF containing images did not trigger it. I'm not sure if there was anything wrong with the operation
Beta Was this translation helpful? Give feedback.
All reactions