implement generic ROF model using Chambolle04 primal-dual method #233

johnnychen94 · 2021-10-20T16:51:54Z

The test codes for CUDA are maintained in an independent folder test/cuda with its own set of Project.toml. We don't have CI setup so I manually test it in my local machine. My plan is to set up GPU CI in Images.jl only after we finish JuliaImages/Images.jl#898, and use a script to collect all distributed CUDA-only tests in all packages.

(ImageFiltering) pkg> activate test/cuda

julia> include("test/cuda/runtests.jl")
Test Summary:  | Pass  Total
ImageFiltering |   28     28

TODO:

resolve the unresolved comments in implement generic ROF model using Chambolle04 primal-dual method ImageBase.jl#24
add benchmark

codecov · 2021-10-20T17:01:40Z

Codecov Report

Merging #233 (a7fc3c6) into master (1bbf666) will increase coverage by 0.25%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
+ Coverage   91.89%   92.14%   +0.25%     
==========================================
  Files          11       12       +1     
  Lines        1603     1642      +39     
==========================================
+ Hits         1473     1513      +40     
+ Misses        130      129       -1

Impacted Files	Coverage Δ
src/ImageFiltering.jl	`68.75% <ø> (ø)`
src/models.jl	`100.00% <100.00%> (ø)`
src/border.jl	`93.47% <0.00%> (+0.54%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bbf666...a7fc3c6. Read the comment docs.

johnnychen94 · 2021-10-20T17:30:25Z

Benchmark on a9ed869:

using ImageFiltering
using ImageBase
using ImageFiltering.Models
using TestImages
using Random

using CUDA
CUDA.allowscalar(false)

# 2d Gray
img = testimage("cameraman");
img_noisy = img .+ 0.1f0*randn(MersenneTwister(0), Float32, size(img));
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 59.914 ms (134 allocations: 67.00 MiB)
# before: 101.396 ms (134 allocations: 134.00 MiB)

img_noisy_cu = CuArray(Float32.(img_noisy));
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 4.188 ms (8793 allocations: 722.03 KiB)
# before: 4.060 ms (8793 allocations: 754.84 KiB)

# 2d RGB
img = testimage("lighthouse");
img_noisy = img .+ colorview(RGB, ntuple(i->0.05.*randn(MersenneTwister(i), size(img)), 3)...);
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 312.251 ms (136 allocations: 303.00 MiB)
# before: 610.880 ms (134 allocations: 597.00 MiB)

img_noisy_cu = CuArray(float32.(img_noisy))
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 5.381 ms (8433 allocations: 699.53 KiB)
# before: 7.176 ms (8433 allocations: 729.53 KiB)

I believe the implementation can be faster on both CPU and GPU, but the higher priority is to replace Images.imROF so I don't plan to investigate the performance anymore in this PR. This is already satisfying.

FWIW, the MATLAB version of the same algorithm used in our lab is about 50ms for a 2d gray image.

docs/src/function_reference.md

Float32 type is faster on GPU than Float64 type. This also gives a 1.5x performance boost on CPU.

johnnychen94 · 2021-10-26T16:07:29Z

@timholy Since this implementation is already much better than the old imROF, I plan to merge this in a few days and deprecate Images.imROF. I know you're also busy these days but it would be great if you can review this.

I might add more model solvers in the near future so I want to make sure that I'm adding codes to the right place.

src/models.jl

timholy · 2021-10-28T11:08:47Z

src/models.jl

+
+    # use Float32 for better GPU performance
+    τ = Float32(1/4) # see 2nd remark after proof of Theorem 3.1.
+    λ = Float32(λ)


If you're going to do this, then best to do in a stub method to reduce latency. I.e.,

function myfunc(x::Int, args...) # big method, slow to compile, so we compile it only for `x::Int` end myfunc(x::Integer, args...) = myfunc(Int(x), args...) # very fast to compile, we can make lots of instances

This is definitely a good suggestion, just that I don't have a good estimation of how much benefit we get by doing this. Are there any utils from SnoopCompile to watch it more closely other than check the first-time-to-plot with @time? For functions that take a long run, the compile-time might not be very easy to identify.

Edit:

It seems yes, I found https://timholy.github.io/SnoopCompile.jl/stable/pgdsgui/#pgds

It's also a variant of something in the style guide: https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller

timholy · 2021-10-28T11:15:24Z

That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use -singleCompThread? That might tell us where the difference lies.

timholy · 2021-10-28T11:16:00Z

But agreed, further performance optimization can happen later. Thanks so much for doing this, and congrats on such a huge improvement!!

johnnychen94 · 2021-10-29T17:39:09Z

That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use -singleCompThread? That might tell us where the difference lies.

I believe when JuliaImages/ImageBase.jl#25 is done we'd just be much better.

I also introduced an in-place version of it because there are some algorithms built on top of denoisers. For instance, plug-and-play https://engineering.purdue.edu/ChanGroup/project_PnP.html

The in-place version reduces the CPU runtime from ~55ms to ~40ms so it's definitely worth doing. ~~(I didn't check the GPU runtime, though).~~ GPU runtime is the same.

Plan to merge tomorrow unless we get more comments in.

Because the boundary handling of divergence has changed, this is not an identical implementation of `imROF`. Thus old test codes are removed in this commit. Instead, ImageFiltering has more comprehensive test codes. See also: JuliaImages/ImageFiltering.jl#233

Because the boundary handling of divergence has changed, this is not an identical implementation of `imROF`. Thus old test codes are removed in this commit. Instead, ImageFiltering has more comprehensive test codes. See also: JuliaImages/ImageFiltering.jl#233 This commit bumps ImageFiltering compatibility to at least v0.7.1

johnnychen94 mentioned this pull request Oct 20, 2021

Linear filtering? JuliaImages/ImageSmooth.jl#13

Open

johnnychen94 commented Oct 20, 2021

View reviewed changes

docs/src/function_reference.md Outdated Show resolved Hide resolved

johnnychen94 mentioned this pull request Oct 20, 2021

implement generic ROF model using Chambolle04 primal-dual method JuliaImages/ImageBase.jl#24

Closed

1 task

johnnychen94 added 3 commits October 21, 2021 16:21

implement generic ROF model using Chambolle04 primal-dual method

bfbaf69

force the computation on float32 type

a9ed869

Float32 type is faster on GPU than Float64 type. This also gives a 1.5x performance boost on CPU.

add benchmark group for solve_ROF_PD

3c441fa

johnnychen94 force-pushed the jc/rof_pd branch from 0408ac7 to 3c441fa Compare October 21, 2021 08:21

johnnychen94 requested a review from timholy October 21, 2021 08:27

timholy approved these changes Oct 28, 2021

View reviewed changes

word tweaks, separate stub method and in-place version

a7fc3c6

johnnychen94 merged commit 44d08b2 into master Oct 30, 2021

johnnychen94 deleted the jc/rof_pd branch October 30, 2021 06:30

johnnychen94 mentioned this pull request Oct 30, 2021

reimplement imROF JuliaImages/Images.jl#755

Closed

johnnychen94 mentioned this pull request Dec 30, 2021

add CUDA support johnnychen94/LocalBinaryPatterns.jl#18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement generic ROF model using Chambolle04 primal-dual method #233

implement generic ROF model using Chambolle04 primal-dual method #233

johnnychen94 commented Oct 20, 2021 •

edited

Loading

codecov bot commented Oct 20, 2021 •

edited

Loading

johnnychen94 commented Oct 20, 2021 •

edited

Loading

johnnychen94 commented Oct 26, 2021 •

edited

Loading

timholy Oct 28, 2021

johnnychen94 Oct 29, 2021 •

edited

Loading

timholy Oct 30, 2021

timholy commented Oct 28, 2021

timholy commented Oct 28, 2021

johnnychen94 commented Oct 29, 2021 •

edited

Loading

implement generic ROF model using Chambolle04 primal-dual method #233

implement generic ROF model using Chambolle04 primal-dual method #233

Conversation

johnnychen94 commented Oct 20, 2021 • edited Loading

codecov bot commented Oct 20, 2021 • edited Loading

Codecov Report

johnnychen94 commented Oct 20, 2021 • edited Loading

johnnychen94 commented Oct 26, 2021 • edited Loading

timholy Oct 28, 2021

Choose a reason for hiding this comment

johnnychen94 Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

timholy Oct 30, 2021

Choose a reason for hiding this comment

timholy commented Oct 28, 2021

timholy commented Oct 28, 2021

johnnychen94 commented Oct 29, 2021 • edited Loading

johnnychen94 commented Oct 20, 2021 •

edited

Loading

codecov bot commented Oct 20, 2021 •

edited

Loading

johnnychen94 commented Oct 20, 2021 •

edited

Loading

johnnychen94 commented Oct 26, 2021 •

edited

Loading

johnnychen94 Oct 29, 2021 •

edited

Loading

johnnychen94 commented Oct 29, 2021 •

edited

Loading