-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preallocate imfilter kernel memory #45
Preallocate imfilter kernel memory #45
Conversation
Codecov Report
@@ Coverage Diff @@
## master #45 +/- ##
==========================================
- Coverage 85.71% 85.08% -0.63%
==========================================
Files 9 9
Lines 420 456 +36
==========================================
+ Hits 360 388 +28
- Misses 60 68 +8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@ClaroHenrique I was looking into the code and perhaps we can improve the situation by refactoring the imfilter_kernel and fastdistance functions to accept the img and the krn with the same size already. So instead of adding a third argument to all functions we could simply pass the two arrays with the same size (krn padded) and a third argument with the cartesian indices if necessary. That will be more intuitive to maintain. Do you understand what I mean? Can you work on that? |
Working on that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ClaroHenrique I think we are taking too long to address these optimizations in the code. Can we set a meeting to discuss a plan that is effective and quick to execute?
@ClaroHenrique I merged it assuming that you saw good speedups with the change. I will test it locally now to see how it goes on my NVIDIA GPU. |
This PR aims to optimize the kernel padding in GPU imfilter. The idea is to preallocate a CuArray with the same size of TI and copy the kernel data to this array, avoiding the full padded kernel allocation in every imfilter operation.