Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU performance improvements #488

Merged
merged 50 commits into from
Aug 2, 2024
Merged
Changes from 3 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
45333fa
basic benchmarks
DiamonDinoia Jul 3, 2024
b95a082
added plotting script
DiamonDinoia Jul 4, 2024
ae55ca5
optimised plotting
DiamonDinoia Jul 8, 2024
16e27f0
fixed plotting and metrics
DiamonDinoia Jul 8, 2024
49d1f21
fixed the plot script
DiamonDinoia Jul 8, 2024
2fdae68
bin_size_x is as function of the shared memory available
DiamonDinoia Jul 8, 2024
c0d9923
bin_size_x is as function of the shared memory available
DiamonDinoia Jul 8, 2024
907797c
minor optimizations in 1D
DiamonDinoia Jul 9, 2024
60f4780
otpimized nupts driven
DiamonDinoia Jul 12, 2024
35dcc66
Optimized 1D and 2D
DiamonDinoia Jul 15, 2024
e1ad9bb
Merge branch 'master' into gpu-optimizations
DiamonDinoia Jul 15, 2024
366295d
3D integer operations
DiamonDinoia Jul 18, 2024
24bf6be
3D SM and GM optimized
DiamonDinoia Jul 18, 2024
960117a
bump cuda version
DiamonDinoia Jul 18, 2024
4295a86
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 23, 2024
c1b14c6
changed matlab to generate necessary cuda upsampfact files
DiamonDinoia Jul 23, 2024
f300d2d
added new coeffs
DiamonDinoia Jul 23, 2024
e86c762
Merge remote-tracking branch 'refs/remotes/origin/gpu-optimizations' …
DiamonDinoia Jul 23, 2024
db0457a
restoring .m from master
DiamonDinoia Jul 23, 2024
d0ce11e
updated hook
DiamonDinoia Jul 23, 2024
513ce4b
updated matlab upsampfact
DiamonDinoia Jul 23, 2024
798717d
updated coefficients
DiamonDinoia Jul 23, 2024
282baf5
new coeffs
DiamonDinoia Jul 23, 2024
12822a2
updated cufinufft to new coeff
DiamonDinoia Jul 23, 2024
badf22f
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 23, 2024
bf6328b
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 23, 2024
ae783da
picked good defaults for method
DiamonDinoia Jul 24, 2024
d29fcf5
update configuration
DiamonDinoia Jul 24, 2024
73f937b
upated build system
DiamonDinoia Jul 25, 2024
0724866
fixing jenkins
DiamonDinoia Jul 25, 2024
8cd50fc
using cuda 11.2
DiamonDinoia Jul 25, 2024
49a9d7e
using sm90 atomics
DiamonDinoia Jul 25, 2024
041a536
updated script
DiamonDinoia Jul 25, 2024
54683c3
fixed bin sizes
DiamonDinoia Jul 26, 2024
4f19103
Merge branch 'master' into gpu-optimizations
DiamonDinoia Jul 26, 2024
dc3a628
using floor in fold_rescale updated changelog
DiamonDinoia Jul 26, 2024
b3237f7
fixed a mistake
DiamonDinoia Jul 26, 2024
db80aad
added comments for review
DiamonDinoia Jul 26, 2024
c225fb5
fixing review comments
DiamonDinoia Jul 31, 2024
394550f
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 31, 2024
5606aa0
merged master
DiamonDinoia Jul 31, 2024
74ccd71
fixed cmake
DiamonDinoia Jul 31, 2024
ee28d05
Gcc-9 fixes; Ker size fixed too
DiamonDinoia Aug 1, 2024
466ddff
windows compatibility tweak; unit testing the 1.25 upsampfact
DiamonDinoia Aug 1, 2024
3f60ca4
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Aug 1, 2024
fb48ff8
added forgotten c++17 flag
DiamonDinoia Aug 1, 2024
5d7e276
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Aug 2, 2024
afabb3f
Addressing review comments
DiamonDinoia Aug 2, 2024
c3df5e1
Added warning
DiamonDinoia Aug 2, 2024
44c523b
updated changelog
DiamonDinoia Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions devel/gen_all_horner_C_code.m
Original file line number Diff line number Diff line change
@@ -10,14 +10,26 @@
clear
opts = struct();

for upsampfac = [2.0, 1.25]; % sigma: either 2 (default) or low (eg 5/4)
fprintf('upsampfac = %g...\n',upsampfac)

ws = 2:16;
opts.wpad = true; % pad kernel eval to multiple of 4
ws = 2:16;
upsampfac = 1.25; % sigma (upsampling): either 2 (default) or low (eg 5/4).
opts.wpad = false; % pad kernel eval to multiple of 4

if upsampfac==2, fid = fopen('../src/ker_horner_allw_loop_constexpr.c','w');
else, fid = fopen('../src/ker_lowupsampfac_horner_allw_loop_constexpr.c','w');
if upsampfac==2, fid = fopen('../include/cufinufft/contrib/ker_horner_allw_loop.inc','w');
else, fid = fopen('../include/cufinufft/contrib/ker_lowupsampfac_horner_allw_loop.inc','w');
end
fwrite(fid,sprintf('// Code generated by gen_all_horner_C_code.m in finufft/devel\n'));
fwrite(fid,sprintf('// Authors: Alex Barnett & Ludvig af Klinteberg.\n// (C) The Simons Foundation, Inc.\n'));
for j=1:numel(ws)
w = ws(j)
if upsampfac==2 % hardwire the betas for this default case
betaoverws = [2.20 2.26 2.38 2.30]; % matches setup_spreader
beta = betaoverws(min(4,w-1)) * w; % uses last entry for w>=5
d = w + 2 + (w<=8); % between 2-3 more degree than w
else % use formulae, must match params in setup_spreader...
gamma=0.97; % safety factor
betaoverws = gamma*pi*(1-1/(2*upsampfac)); % from cutoff freq formula
beta = betaoverws * w;
d = w + 1 + (w<=8); % less, since beta smaller, smoother
end
fwrite(fid,sprintf('// Code generated by gen_all_horner_C_code.m in finufft/devel\n'));
fwrite(fid,sprintf('// Authors: Alex Barnett & Ludvig af Klinteberg.\n// (C) The Simons Foundation, Inc.\n'));
4 changes: 2 additions & 2 deletions devel/gen_ker_horner_loop_C_code.m
Original file line number Diff line number Diff line change
@@ -37,8 +37,8 @@
else
width = w;
end
for n=1:d+1 % loop over poly coeff powers
s = sprintf('FLT c%d[] = {%.16E',n-1, C(n,1));
for n=1:d % loop over poly coeff powers
s = sprintf('constexpr FLT c%d[] = {%.16E',n-1, C(n,1));
for i=2:width % loop over segments
s = sprintf('%s, %.16E', s, C(n,i));
end
389 changes: 183 additions & 206 deletions include/cufinufft/contrib/ker_horner_allw_loop.inc

Large diffs are not rendered by default.

192 changes: 192 additions & 0 deletions include/cufinufft/contrib/ker_lowupsampfac_horner_allw_loop.inc

Large diffs are not rendered by default.