-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch-prediction across multiple GPUs and more efficient patch-prediction #48
base: master
Are you sure you want to change the base?
Conversation
…w type-hints and improved the code-style a little bit by running an auto-formatter on the entire file.
… efficient way and adding support for batch-conversion with multiple GPUs.
Dear @apacha , Few things like possible OOM error , binarization with collection of models and exact processing time improvement will be tested before merging your PR. By the way as Qurator team we would like to thank you for your efforts for improving sbb_binarization tool. |
Dear @apacha , I got this error with your pull request Traceback (most recent call last): |
You get this exception as I've rewritten the SbbBinarizer class to split loading the model from initializing the object, see |
… in an efficient way. Updated cli.py to correctly load and initialize the changed SbbBinarizer class
# Conflicts: # README.md # sbb_binarize/cli.py # sbb_binarize/sbb_binarize.py
See apacha#1 for an update of the PR. |
Not sure if this is the right place for a discussion, but IMO this is not the right approach for efficient prediction yet. We should define a tf.data pipeline, allowing pipelining between our (intensive!) CPU pre- and postprocessing and the GPU side. On that occasion, multithreading on the CPU side (reshaping and contour finding with OpenCV) should be attempted, too... Also, perhaps at least some of the OpenCV stuff can be ported to cv2.cuda calls – I know, it would still involve multiple additional CPU-GPU transfers (since the cv2.cuda is not in the same graph as the tf/keras part). But at least the GPU would be utilised a bit more. |
Sorry, in my previous comment I was thinking more about Eynollah than the Binarizer (hence the heavy CPU part). And @apacha's PR does already speed up by an order of magnitude. I can see minor differences between result from |
tf.data pipelining with heavy CPU processing itself seems to be hard to get right: to get true parallelisation, one probably needs tfaip... |
fixup for batch prediction PR
In order to batch-binarize thousands of images, I've rewritten the prediction script to allow us to predict around 1500-2000 images per hour on a decent machine with two GPUs.
The proposed changes include:
mpire
libraryPlease note:
I know that the code looks completely different now (hopefully more readable) and is probably not 1:1 compatible with the remaining code in your repository, but I tried to put all the relevant changes into this PR and make the code as self-contained as possible to allow you to update the solution as you see fit.
Thanks for sharing the code-base with us. I hope that this PR is of some help to you.