Batch-prediction across multiple GPUs and more efficient patch-prediction #48

apacha · 2022-08-30T08:57:26Z

In order to batch-binarize thousands of images, I've rewritten the prediction script to allow us to predict around 1500-2000 images per hour on a decent machine with two GPUs.

The proposed changes include:

An efficient way to compute the image patches instead of a very inefficient loop
Complete removal of the prediction on the down-scaled image as the results are pretty much always worse
Batch-prediction code that can binarize an entire directory into a given output directory while preserving the folder structure and skipping images that have already been binarized, to allow stopping and continuing the conversion
Multiprocessing batch-prediction across multiple GPUs using the mpire library
A fix for the memory-leak that caused mass-binarization to very quickly crash because we were running out of memory on the GPU. With this fix, we are already running the conversion for 16 hours without any crash.
Simplified loading of the model removing obsolete session-handling code

Please note:
I know that the code looks completely different now (hopefully more readable) and is probably not 1:1 compatible with the remaining code in your repository, but I tried to put all the relevant changes into this PR and make the code as self-contained as possible to allow you to update the solution as you see fit.

Thanks for sharing the code-base with us. I hope that this PR is of some help to you.

…w type-hints and improved the code-style a little bit by running an auto-formatter on the entire file.

… efficient way and adding support for batch-conversion with multiple GPUs.

vahidrezanezhad · 2022-09-06T09:27:35Z

Dear @apacha ,

Few things like possible OOM error , binarization with collection of models and exact processing time improvement will be tested before merging your PR. By the way as Qurator team we would like to thank you for your efforts for improving sbb_binarization tool.

vahidrezanezhad · 2022-09-14T16:18:59Z

In order to batch-binarize thousands of images, I've rewritten the prediction script to allow us to predict around 1500-2000 images per hour on a decent machine with two GPUs.

The proposed changes include:

An efficient way to compute the image patches instead of a very inefficient loop

Complete removal of the prediction on the down-scaled image as the results are pretty much always worse

Batch-prediction code that can binarize an entire directory into a given output directory while preserving the folder structure and skipping images that have already been binarized, to allow stopping and continuing the conversion

Multiprocessing batch-prediction across multiple GPUs using the mpire library

A fix for the memory-leak that caused mass-binarization to very quickly crash because we were running out of memory on the GPU. With this fix, we are already running the conversion for 16 hours without any crash.

Simplified loading of the model removing obsolete session-handling code

Please note: I know that the code looks completely different now (hopefully more readable) and is probably not 1:1 compatible with the remaining code in your repository, but I tried to put all the relevant changes into this PR and make the code as self-contained as possible to allow you to update the solution as you see fit.

Thanks for sharing the code-base with us. I hope that this PR is of some help to you.

Dear @apacha ,

I got this error with your pull request

Traceback (most recent call last):
File "/home/vahid/Documents/sbb_binarization/bin_envnew/bin/sbb_binarize", line 8, in
sys.exit(main())
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/vahid/Documents/sbb_binarization/bin_envnew/lib64/python3.6/site-packages/sbb_binarize/cli.py", line 15, in main
SbbBinarizer(model_dir).run(image_path=input_image, use_patches=patches, save=output_image)
TypeError: init() takes 1 positional argument but 2 were given

apacha · 2022-09-15T08:48:01Z

You get this exception as I've rewritten the SbbBinarizer class to split loading the model from initializing the object, see
https://github.com/qurator-spk/sbb_binarization/pull/48/files#diff-3b757c0dba09b4add9cf8c0566db9834fdb19e6af8e5aaf538e8810b1ac613eeR164
I'll update the cli.py as well

… in an efficient way. Updated cli.py to correctly load and initialize the changed SbbBinarizer class

# Conflicts: # README.md # sbb_binarize/cli.py # sbb_binarize/sbb_binarize.py

bertsky · 2023-02-14T23:39:44Z

See apacha#1 for an update of the PR.

bertsky · 2023-02-14T23:46:17Z

Not sure if this is the right place for a discussion, but IMO this is not the right approach for efficient prediction yet. We should define a tf.data pipeline, allowing pipelining between our (intensive!) CPU pre- and postprocessing and the GPU side. On that occasion, multithreading on the CPU side (reshaping and contour finding with OpenCV) should be attempted, too...

Also, perhaps at least some of the OpenCV stuff can be ported to cv2.cuda calls – I know, it would still involve multiple additional CPU-GPU transfers (since the cv2.cuda is not in the same graph as the tf/keras part). But at least the GPU would be utilised a bit more.

bertsky · 2023-02-15T11:09:38Z

Sorry, in my previous comment I was thinking more about Eynollah than the Binarizer (hence the heavy CPU part). And @apacha's PR does already speed up by an order of magnitude. I can see minor differences between result from tf.image.extract_patches and the old numpy tiling. But the quality is still very good. (I can provide examples before/after, both for the old ensemble model and the 2021 version – let me know if you are interested.) My argument was only meant about the direction of future work we should undertake.

bertsky · 2023-02-15T16:23:11Z

tf.data pipelining with heavy CPU processing itself seems to be hard to get right: to get true parallelisation, one probably needs tfaip...

fixup for batch prediction PR

apacha added 3 commits August 30, 2022 10:58

Improved loading of models to allow providing a directory, added a fe…

b0a8b61

…w type-hints and improved the code-style a little bit by running an auto-formatter on the entire file.

Rewrote binarization script to always use patches, but in a much more…

4112c6f

… efficient way and adding support for batch-conversion with multiple GPUs.

Improving comments in the code

1baa70e

apacha force-pushed the master branch from 02d31c1 to 1baa70e Compare August 30, 2022 08:59

apacha added 3 commits August 30, 2022 11:00

Improved formatting

1c7e8c5

Remove obsolete comment

9db2bcc

Adding a bit more documentation

7edf34a

apacha and others added 6 commits September 15, 2022 10:53

Removed --patches flag, as the new version automatically uses patches…

4a97eac

… in an efficient way. Updated cli.py to correctly load and initialize the changed SbbBinarizer class

Merge branch 'master-upstream'

9ef8259

# Conflicts: # README.md # sbb_binarize/cli.py # sbb_binarize/sbb_binarize.py

reintroduce TF stderr / log silencer

cade5dd

reinstate ensemble combination and in-memory prediction

342e94e

adapt OCR-D wrapper

4086c69

comment out explicit GC calls / session management

6dc83dd

apacha added 2 commits March 19, 2023 07:59

Merge pull request #1 from bertsky/batch-prediction

3ade5ec

fixup for batch prediction PR

Adding garbage collection and cleanup to avoid memory leaks again.

4abd5b2

bertsky mentioned this pull request Apr 29, 2024

enable flowing from directory #60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch-prediction across multiple GPUs and more efficient patch-prediction #48

Batch-prediction across multiple GPUs and more efficient patch-prediction #48

apacha commented Aug 30, 2022 •

edited

Loading

vahidrezanezhad commented Sep 6, 2022

vahidrezanezhad commented Sep 14, 2022

apacha commented Sep 15, 2022 •

edited

Loading

bertsky commented Feb 14, 2023

bertsky commented Feb 14, 2023

bertsky commented Feb 15, 2023

bertsky commented Feb 15, 2023

Batch-prediction across multiple GPUs and more efficient patch-prediction #48

Are you sure you want to change the base?

Batch-prediction across multiple GPUs and more efficient patch-prediction #48

Conversation

apacha commented Aug 30, 2022 • edited Loading

vahidrezanezhad commented Sep 6, 2022

vahidrezanezhad commented Sep 14, 2022

apacha commented Sep 15, 2022 • edited Loading

bertsky commented Feb 14, 2023

bertsky commented Feb 14, 2023

bertsky commented Feb 15, 2023

bertsky commented Feb 15, 2023

apacha commented Aug 30, 2022 •

edited

Loading

apacha commented Sep 15, 2022 •

edited

Loading