-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group relevant information into one block #10500
Comments
I can generally understand what you mean. But the link you shared seems to be dead, can you re-share it? |
From what I understand, are you trying to group the results of ocr in the appropriate blocks like below ? Where the image is divided into different areas such as text, table, and figure. And each text area contains multiple lines of text. You may try the layout recovery module function in ppstructure, which can restores the input image to a word or pdf file with the same layout as the original image. |
@ToddBearThank you for paying attention to my question. Check the link again |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
I figured out how to get text from a PDF file using this library. But I would like to improve my functionality.
The fact is that the recognized text is not presented in a structured way, and I would like to receive text at the output that can be grouped into blocks.
Perhaps I explained not quite clearly, here you can find more details at this link https://stackoverflow.com/questions/76787641/get-data-in-the-form-of-blocks-when-recognizing-a-file
I would be happy if you tell me how to implement this idea
The text was updated successfully, but these errors were encountered: