Group relevant information into one block #10500

paulgromy · 2023-07-28T13:23:55Z

I figured out how to get text from a PDF file using this library. But I would like to improve my functionality.
The fact is that the recognized text is not presented in a structured way, and I would like to receive text at the output that can be grouped into blocks.

Perhaps I explained not quite clearly, here you can find more details at this link https://stackoverflow.com/questions/76787641/get-data-in-the-form-of-blocks-when-recognizing-a-file

I would be happy if you tell me how to implement this idea

ToddBear · 2023-08-02T03:07:22Z

I can generally understand what you mean. But the link you shared seems to be dead, can you re-share it?

shiyutang · 2023-08-02T03:24:57Z

Hi there, did you try ppstructure?

for example, it gets the right-side result on the left image

ToddBear · 2023-08-02T03:41:25Z

From what I understand, are you trying to group the results of ocr in the appropriate blocks like below ?

Where the image is divided into different areas such as text, table, and figure.

And each text area contains multiple lines of text.

You may try the layout recovery module function in ppstructure, which can restores the input image to a word or pdf file with the same layout as the original image.

paulgromy · 2023-08-02T08:12:30Z

@ToddBearThank you for paying attention to my question. Check the link again

github-actions · 2024-01-03T02:42:21Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

paulgromy added the Code PR is needed This issue could inspire a code PR label Jul 28, 2023

paulgromy assigned shiyutang Jul 28, 2023

paddle-bot bot assigned andyjiang1116 Jul 28, 2023

shiyutang unassigned andyjiang1116 Aug 2, 2023

ToddBear mentioned this issue Aug 23, 2023

🏅️飞桨套件快乐开源常规赛 #10223

Closed

github-actions bot added the stale label Jan 3, 2024

github-actions bot closed this as completed Jan 24, 2024

paddle-bot bot added the status/close label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group relevant information into one block #10500

Group relevant information into one block #10500

paulgromy commented Jul 28, 2023

ToddBear commented Aug 2, 2023

shiyutang commented Aug 2, 2023 •

edited

Loading

ToddBear commented Aug 2, 2023 •

edited

Loading

paulgromy commented Aug 2, 2023

github-actions bot commented Jan 3, 2024

Group relevant information into one block #10500

Group relevant information into one block #10500

Comments

paulgromy commented Jul 28, 2023

ToddBear commented Aug 2, 2023

shiyutang commented Aug 2, 2023 • edited Loading

ToddBear commented Aug 2, 2023 • edited Loading

paulgromy commented Aug 2, 2023

github-actions bot commented Jan 3, 2024

shiyutang commented Aug 2, 2023 •

edited

Loading

ToddBear commented Aug 2, 2023 •

edited

Loading