How to return aligned properly text from skewed image? #1810

VMM-MMV · 2024-12-03T11:02:37Z

VMM-MMV
Dec 3, 2024

I am using doctr to perform OCR on an skewed image. Something like:

Although the OCR accurately recognizes the words, the text returned is organized based on the coordinates of the skewed image. As a result, when I try to combine the words into a coherent string, the output becomes complete gibberish.

The main problem is in the lines:

World War ml or the
was a global conflict Second World War (1
the world's
between two coalitions: September 1939 - 2

Instead of extracting the actual lines as they appear in the image, I think it takes a horizontal alignment.

I tried sorting in different ways, and using different parameters, but all of them fall short.

What I found to work, is to de-skew the image with another library and then ocr the result.

And now, I get proper lines:

World War ml or the Second World War (1 September 1939 - 2 September 1945)
was a global conflict between two coalitions: the Allies and the Axis powers. Nearly all
the worid's countres--including all the great powers--particpated, with many investing  
all available economic, industrial, and scientific capabilities in pursuit of total war,

This works, but I am certain, that there must be a way to directly do this from doctr without de-skewing first.

Code:

def read_pdf(file_path):
    model = ocr_predictor(
        det_arch='db_resnet50',
        reco_arch='crnn_vgg16_bn',
        pretrained=True,
        export_as_straight_boxes=True,
        detect_orientation=True
    )

    doc = DocumentFile.from_pdf(file_path)
    result = model(doc)

    full_text = []
    for page in result.pages:
        page_text = []
        for block in page.blocks:
            for line in block.lines:
                line_text = ' '.join([word.value for word in sorted(line.words, key=lambda w: w.geometry[0][0])])
                page_text.append("\n" + line_text)

        full_text.append(' '.join(page_text))

    return ' '.join(full_text)

Answered by felixdittrich92

Dec 3, 2024

Hi @VMM-MMV 👋,

You can pass some args to the ocr_predictor to reach this:

predictor = ocr_predictor(
    pretrained=True,
    # Document related parameters
    assume_straight_pages=False,
    straighten_pages=True,  # This corrects deskew under the hood
    export_as_straight_boxes=True,
    detect_orientation=True,
    # Orientation specific parameters in combination with `assume_straight_pages=False` and/or `straighten_pages=True`
    disable_crop_orientation=True,  # Should be False if words inside the doc are multi-oriented
    disable_page_orientation=True,  # Should be False if the doc is possible more than (-45 - 45 degree rotated)
)

The corrected images can be grabbed from the ou…

View full answer

felixdittrich92 · 2024-12-03T12:26:00Z

felixdittrich92
Dec 3, 2024
Maintainer

Hi @VMM-MMV 👋,

You can pass some args to the ocr_predictor to reach this:

predictor = ocr_predictor(
    pretrained=True,
    # Document related parameters
    assume_straight_pages=False,
    straighten_pages=True,  # This corrects deskew under the hood
    export_as_straight_boxes=True,
    detect_orientation=True,
    # Orientation specific parameters in combination with `assume_straight_pages=False` and/or `straighten_pages=True`
    disable_crop_orientation=True,  # Should be False if words inside the doc are multi-oriented
    disable_page_orientation=True,  # Should be False if the doc is possible more than (-45 - 45 degree rotated)
)

The corrected images can be grabbed from the output

result = predictor(doc)
# list of numpy arrays containing the corrected images
corrected_images = [page.page for page in result.pages]

Best,
Felix

0 replies

VMM-MMV · 2024-12-04T00:12:19Z

VMM-MMV
Dec 4, 2024
Author

Thank You! straighten_pages=True solved the issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to return aligned properly text from skewed image? #1810

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to return aligned properly text from skewed image? #1810

VMM-MMV Dec 3, 2024

Replies: 2 comments

felixdittrich92 Dec 3, 2024 Maintainer

VMM-MMV Dec 4, 2024 Author

VMM-MMV
Dec 3, 2024

felixdittrich92
Dec 3, 2024
Maintainer

VMM-MMV
Dec 4, 2024
Author