exa #25

jbcaillau · 2024-08-06T13:20:26Z

between NLP models (generated by OptimalControl) and ExaModels
solved either by Ipopt or MadNLP

@frapac the point with ExaModels is to use (i) multi-threaded CPUs and / or (ii) GPUs as documented here. Do you have an available config to test the code properly wrt. (i) and (ii)?

jbcaillau · 2024-08-06T23:08:19Z

shorter code tested on GPU here (ExaModels + MadNLPGPU)
https://github.com/jbcaillau/COTS.jl/blob/main/exa2/goddard-exa2.jl
currently, incompatibility between MadNLPGPU and OptimalControl

amontoison · 2024-08-06T23:43:04Z

@jbcaillau Note that ExaModels works with any GPU backend (NVIDIA, CUDA, Intel, Apple...).
However, MadNLP currently only works with NVIDIA GPUs.
The current bottleneck is the lack of sparse linear solvers for other GPU architectures to solve the KKT systems in the optimization solver.

jbcaillau · 2024-08-08T22:51:19Z

@amontoison @frapac @0Yassine0 run of goddard-exa2.jl on a cluster at Inria:

N = 100
exa0:  311.847 ms (25632 allocations: 240.12 MiB) # 90 iterations
exa1:  343.495 ms (50469 allocations: 242.28 MiB) # 90 iterations
exa2:  129.811 ms (313799 allocations: 8.74 MiB) # 20 iterations

N = 500
exa0:  734.530 ms (10977 allocations: 1.92 GiB) # 38 iterations
exa1:  733.023 ms (21471 allocations: 1.92 GiB) # 38 iterations
exa2:  628.971 ms (1183995 allocations: 32.42 MiB) # 67 iterations

N = 1000
exa0:  1.748 s (14564 allocations: 9.66 GiB) # 50 iterations
exa1:  1.602 s (28779 allocations: 9.66 GiB) # 50 iterations
exa2:  648.181 ms (1238880 allocations: 34.74 MiB) # 73 iterations

N = 5000
exa0:  6.547 s (9913 allocations: 64.37 GiB)  # 31 iterations
exa1:  6.637 s (41064 allocations: 64.37 GiB) # 31 iterations
exa2:  686.667 ms (658404 allocations: 22.20 MiB) # 34 iterations

jbcaillau · 2024-08-27T22:15:53Z

@amontoison @frapac @0Yassine0 run of goddard-exa2.jl using a single V100 GPU (🙏🏽 JM Lacroix from LJAD) - # of iterations to be checked, e.g. for N = 1000:

N = 100
exa0:  122.578 ms (22124 allocations: 209.36 MiB)
exa1:  126.011 ms (43646 allocations: 211.22 MiB)
exa2:  86.335 ms (259116 allocations: 7.47 MiB)

N = 500
exa0:  241.734 ms (9332 allocations: 1.62 GiB)
exa1:  242.094 ms (18284 allocations: 1.62 GiB)
exa2:  137.797 ms (384012 allocations: 11.25 MiB)

N = 1000
exa0:  677.497 ms (13432 allocations: 8.91 GiB)
exa1:  678.521 ms (26570 allocations: 8.91 GiB)
exa2:  1.003 s (2398734 allocations: 68.44 MiB)

N = 2000
exa0:  1.829 s (15500 allocations: 42.40 GiB)
exa1:  1.833 s (30228 allocations: 42.40 GiB)
exa2:  505.739 ms (779640 allocations: 23.40 MiB)

N = 5000
exa0:  2.309 s (7886 allocations: 50.31 GiB)
exa1:  2.319 s (16848 allocations: 50.31 GiB)
exa2:  502.286 ms (469136 allocations: 17.29 MiB)

amontoison · 2024-08-27T22:35:56Z

@jbcaillau Can you print the number of iterations?
I suspect that we need more iterations on GPU because of the rubustness of the linear solver (cuDSS).

It's quite good, it seems that you always have a speed-up on GPU.

jbcaillau · 2024-12-10T22:03:28Z

Test on single precision GPU (Inria nef cluster), tol = 1e-5 (compare #25 (comment)), using docp_exa_s and oarsub -p "gpu='YES'" -l /nodes=1,walltime=1 -I:

N = 100
exa0:  72.682 ms (3175 allocations: 80.58 MiB)
exa1:  75.944 ms (12425 allocations: 81.17 MiB)
exa2:  146.233 ms (294589 allocations: 7.29 MiB)

N = 500
exa0:  405.774 ms (3603 allocations: 1.67 GiB)
exa1:  411.145 ms (13185 allocations: 1.67 GiB)
exa2:  227.096 ms (312666 allocations: 8.08 MiB)

N = 1000
exa0:  1.522 s (5222 allocations: 10.00 GiB)
exa1:  1.485 s (20226 allocations: 10.00 GiB)
exa2:  310.898 ms (343664 allocations: 9.14 MiB)

N = 2000
exa0:  1.750 s (3524 allocations: 23.32 GiB)
exa1:  1.760 s (13628 allocations: 23.32 GiB)
exa2:  385.701 ms (287417 allocations: 8.55 MiB)

N = 5000
exa0:  3.579 s (2746 allocations: 40.32 GiB)
exa1:  3.578 s (10400 allocations: 40.32 GiB)
exa2:  785.430 ms (359051 allocations: 12.60 MiB)

N = 8000
exa0:  8.069 s (3166 allocations: 48.59 GiB)
exa1:  8.031 s (12220 allocations: 48.59 GiB)
exa2:  1.366 s (423244 allocations: 16.09 MiB)

N = 10000
exa0:  9.788 s (2956 allocations: 44.69 GiB)
exa1:  9.880 s (11310 allocations: 44.69 GiB)
exa2:  2.044 s (499806 allocations: 19.33 MiB)

frapac · 2024-12-11T08:20:18Z

@jbcaillau For your information, we can also solve ExaModels instances in quadruple precision now (using quadmath).

jbcaillau · 2024-12-11T17:49:16Z

@jbcaillau For your information, we can also solve ExaModels instances in quadruple precision now (using quadmath).

hi @frapac thanks for the information: tests above are in single precision only (Inria cluster), need to re-run the tests on double precision gpu's (LJAD cluster) 🤞🏾

jbcaillau · 2024-12-16T15:18:37Z

Follow up here #26

exa

081ab37

This was referenced Aug 6, 2024

Test with ExaModels #26

Closed

Improve performance control-toolbox/OptimalControl.jl#139

Open

jbcaillau added 4 commits August 6, 2024 23:46

exa2

27f70ed

exa

34249bf

exa

0c0f67c

exa2

c5eb771

exa2

be62f33

exa2

e3e4ef0

0Yassine0 merged commit 74346ac into control-toolbox:main Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exa #25

exa #25

jbcaillau commented Aug 6, 2024

jbcaillau commented Aug 6, 2024 •

edited

Loading

amontoison commented Aug 6, 2024 •

edited

Loading

jbcaillau commented Aug 8, 2024

jbcaillau commented Aug 27, 2024 •

edited

Loading

amontoison commented Aug 27, 2024 •

edited

Loading

jbcaillau commented Dec 10, 2024 •

edited

Loading

frapac commented Dec 11, 2024

jbcaillau commented Dec 11, 2024

jbcaillau commented Dec 16, 2024

exa #25

exa #25

Conversation

jbcaillau commented Aug 6, 2024

jbcaillau commented Aug 6, 2024 • edited Loading

amontoison commented Aug 6, 2024 • edited Loading

jbcaillau commented Aug 8, 2024

jbcaillau commented Aug 27, 2024 • edited Loading

amontoison commented Aug 27, 2024 • edited Loading

jbcaillau commented Dec 10, 2024 • edited Loading

frapac commented Dec 11, 2024

jbcaillau commented Dec 11, 2024

jbcaillau commented Dec 16, 2024

jbcaillau commented Aug 6, 2024 •

edited

Loading

amontoison commented Aug 6, 2024 •

edited

Loading

jbcaillau commented Aug 27, 2024 •

edited

Loading

amontoison commented Aug 27, 2024 •

edited

Loading

jbcaillau commented Dec 10, 2024 •

edited

Loading