-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement generic ROF model using Chambolle04 primal-dual method #233
Conversation
Codecov Report
@@ Coverage Diff @@
## master #233 +/- ##
==========================================
+ Coverage 91.89% 92.14% +0.25%
==========================================
Files 11 12 +1
Lines 1603 1642 +39
==========================================
+ Hits 1473 1513 +40
+ Misses 130 129 -1
Continue to review full report at Codecov.
|
Benchmark on a9ed869: using ImageFiltering
using ImageBase
using ImageFiltering.Models
using TestImages
using Random
using CUDA
CUDA.allowscalar(false)
# 2d Gray
img = testimage("cameraman");
img_noisy = img .+ 0.1f0*randn(MersenneTwister(0), Float32, size(img));
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 59.914 ms (134 allocations: 67.00 MiB)
# before: 101.396 ms (134 allocations: 134.00 MiB)
img_noisy_cu = CuArray(Float32.(img_noisy));
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 4.188 ms (8793 allocations: 722.03 KiB)
# before: 4.060 ms (8793 allocations: 754.84 KiB)
# 2d RGB
img = testimage("lighthouse");
img_noisy = img .+ colorview(RGB, ntuple(i->0.05.*randn(MersenneTwister(i), size(img)), 3)...);
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 312.251 ms (136 allocations: 303.00 MiB)
# before: 610.880 ms (134 allocations: 597.00 MiB)
img_noisy_cu = CuArray(float32.(img_noisy))
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 5.381 ms (8433 allocations: 699.53 KiB)
# before: 7.176 ms (8433 allocations: 729.53 KiB) I believe the implementation can be faster on both CPU and GPU, but the higher priority is to replace FWIW, the MATLAB version of the same algorithm used in our lab is about 50ms for a 2d gray image. |
Float32 type is faster on GPU than Float64 type. This also gives a 1.5x performance boost on CPU.
0408ac7
to
3c441fa
Compare
@timholy Since this implementation is already much better than the old I might add more model solvers in the near future so I want to make sure that I'm adding codes to the right place. |
src/models.jl
Outdated
|
||
# use Float32 for better GPU performance | ||
τ = Float32(1/4) # see 2nd remark after proof of Theorem 3.1. | ||
λ = Float32(λ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to do this, then best to do in a stub method to reduce latency. I.e.,
function myfunc(x::Int, args...)
# big method, slow to compile, so we compile it only for `x::Int`
end
myfunc(x::Integer, args...) = myfunc(Int(x), args...) # very fast to compile, we can make lots of instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a good suggestion, just that I don't have a good estimation of how much benefit we get by doing this. Are there any utils from SnoopCompile to watch it more closely other than check the first-time-to-plot with @time
? For functions that take a long run, the compile-time might not be very easy to identify.
Edit:
It seems yes, I found https://timholy.github.io/SnoopCompile.jl/stable/pgdsgui/#pgds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a variant of something in the style guide: https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller
That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use |
But agreed, further performance optimization can happen later. Thanks so much for doing this, and congrats on such a huge improvement!! |
I believe when JuliaImages/ImageBase.jl#25 is done we'd just be much better. I also introduced an in-place version of it because there are some algorithms built on top of denoisers. For instance, plug-and-play https://engineering.purdue.edu/ChanGroup/project_PnP.html The in-place version reduces the CPU runtime from ~55ms to ~40ms so it's definitely worth doing. Plan to merge tomorrow unless we get more comments in. |
Because the boundary handling of divergence has changed, this is not an identical implementation of `imROF`. Thus old test codes are removed in this commit. Instead, ImageFiltering has more comprehensive test codes. See also: JuliaImages/ImageFiltering.jl#233
Because the boundary handling of divergence has changed, this is not an identical implementation of `imROF`. Thus old test codes are removed in this commit. Instead, ImageFiltering has more comprehensive test codes. See also: JuliaImages/ImageFiltering.jl#233 This commit bumps ImageFiltering compatibility to at least v0.7.1
Because the boundary handling of divergence has changed, this is not an identical implementation of `imROF`. Thus old test codes are removed in this commit. Instead, ImageFiltering has more comprehensive test codes. See also: JuliaImages/ImageFiltering.jl#233 This commit bumps ImageFiltering compatibility to at least v0.7.1
Moved from JuliaImages/ImageBase.jl#24
The test codes for CUDA are maintained in an independent folder
test/cuda
with its own set ofProject.toml
. We don't have CI setup so I manually test it in my local machine. My plan is to set up GPU CI in Images.jl only after we finish JuliaImages/Images.jl#898, and use a script to collect all distributed CUDA-only tests in all packages.TODO: