Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exa #25

Merged
merged 7 commits into from
Aug 9, 2024
Merged

exa #25

merged 7 commits into from
Aug 9, 2024

Conversation

jbcaillau
Copy link
Member

@0Yassine0 test on Goddard case

  • between NLP models (generated by OptimalControl) and ExaModels
  • solved either by Ipopt or MadNLP

@frapac the point with ExaModels is to use (i) multi-threaded CPUs and / or (ii) GPUs as documented here. Do you have an available config to test the code properly wrt. (i) and (ii)?

@jbcaillau
Copy link
Member Author

jbcaillau commented Aug 6, 2024

@amontoison
Copy link

amontoison commented Aug 6, 2024

@jbcaillau Note that ExaModels works with any GPU backend (NVIDIA, CUDA, Intel, Apple...).
However, MadNLP currently only works with NVIDIA GPUs.
The current bottleneck is the lack of sparse linear solvers for other GPU architectures to solve the KKT systems in the optimization solver.

@jbcaillau
Copy link
Member Author

@amontoison @frapac @0Yassine0 run of goddard-exa2.jl on a cluster at Inria:

N = 100
exa0:  311.847 ms (25632 allocations: 240.12 MiB) # 90 iterations
exa1:  343.495 ms (50469 allocations: 242.28 MiB) # 90 iterations
exa2:  129.811 ms (313799 allocations: 8.74 MiB) # 20 iterations

N = 500
exa0:  734.530 ms (10977 allocations: 1.92 GiB) # 38 iterations
exa1:  733.023 ms (21471 allocations: 1.92 GiB) # 38 iterations
exa2:  628.971 ms (1183995 allocations: 32.42 MiB) # 67 iterations

N = 1000
exa0:  1.748 s (14564 allocations: 9.66 GiB) # 50 iterations
exa1:  1.602 s (28779 allocations: 9.66 GiB) # 50 iterations
exa2:  648.181 ms (1238880 allocations: 34.74 MiB) # 73 iterations

N = 5000
exa0:  6.547 s (9913 allocations: 64.37 GiB)  # 31 iterations
exa1:  6.637 s (41064 allocations: 64.37 GiB) # 31 iterations
exa2:  686.667 ms (658404 allocations: 22.20 MiB) # 34 iterations

@0Yassine0 0Yassine0 merged commit 74346ac into control-toolbox:main Aug 9, 2024
@jbcaillau
Copy link
Member Author

jbcaillau commented Aug 27, 2024

@amontoison @frapac @0Yassine0 run of goddard-exa2.jl using a single V100 GPU (🙏🏽 JM Lacroix from LJAD) - # of iterations to be checked, e.g. for N = 1000:

N = 100
exa0:  122.578 ms (22124 allocations: 209.36 MiB)
exa1:  126.011 ms (43646 allocations: 211.22 MiB)
exa2:  86.335 ms (259116 allocations: 7.47 MiB)

N = 500
exa0:  241.734 ms (9332 allocations: 1.62 GiB)
exa1:  242.094 ms (18284 allocations: 1.62 GiB)
exa2:  137.797 ms (384012 allocations: 11.25 MiB)

N = 1000
exa0:  677.497 ms (13432 allocations: 8.91 GiB)
exa1:  678.521 ms (26570 allocations: 8.91 GiB)
exa2:  1.003 s (2398734 allocations: 68.44 MiB)

N = 2000
exa0:  1.829 s (15500 allocations: 42.40 GiB)
exa1:  1.833 s (30228 allocations: 42.40 GiB)
exa2:  505.739 ms (779640 allocations: 23.40 MiB)

N = 5000
exa0:  2.309 s (7886 allocations: 50.31 GiB)
exa1:  2.319 s (16848 allocations: 50.31 GiB)
exa2:  502.286 ms (469136 allocations: 17.29 MiB)

@amontoison
Copy link

amontoison commented Aug 27, 2024

@jbcaillau Can you print the number of iterations?
I suspect that we need more iterations on GPU because of the rubustness of the linear solver (cuDSS).

It's quite good, it seems that you always have a speed-up on GPU.

@jbcaillau
Copy link
Member Author

jbcaillau commented Dec 10, 2024

Test on single precision GPU (Inria nef cluster), tol = 1e-5 (compare #25 (comment)), using docp_exa_s and oarsub -p "gpu='YES'" -l /nodes=1,walltime=1 -I:

N = 100
exa0:  72.682 ms (3175 allocations: 80.58 MiB)
exa1:  75.944 ms (12425 allocations: 81.17 MiB)
exa2:  146.233 ms (294589 allocations: 7.29 MiB)

N = 500
exa0:  405.774 ms (3603 allocations: 1.67 GiB)
exa1:  411.145 ms (13185 allocations: 1.67 GiB)
exa2:  227.096 ms (312666 allocations: 8.08 MiB)

N = 1000
exa0:  1.522 s (5222 allocations: 10.00 GiB)
exa1:  1.485 s (20226 allocations: 10.00 GiB)
exa2:  310.898 ms (343664 allocations: 9.14 MiB)

N = 2000
exa0:  1.750 s (3524 allocations: 23.32 GiB)
exa1:  1.760 s (13628 allocations: 23.32 GiB)
exa2:  385.701 ms (287417 allocations: 8.55 MiB)

N = 5000
exa0:  3.579 s (2746 allocations: 40.32 GiB)
exa1:  3.578 s (10400 allocations: 40.32 GiB)
exa2:  785.430 ms (359051 allocations: 12.60 MiB)

N = 8000
exa0:  8.069 s (3166 allocations: 48.59 GiB)
exa1:  8.031 s (12220 allocations: 48.59 GiB)
exa2:  1.366 s (423244 allocations: 16.09 MiB)

N = 10000
exa0:  9.788 s (2956 allocations: 44.69 GiB)
exa1:  9.880 s (11310 allocations: 44.69 GiB)
exa2:  2.044 s (499806 allocations: 19.33 MiB)

@frapac
Copy link
Collaborator

frapac commented Dec 11, 2024

@jbcaillau For your information, we can also solve ExaModels instances in quadruple precision now (using quadmath).

@jbcaillau
Copy link
Member Author

@jbcaillau For your information, we can also solve ExaModels instances in quadruple precision now (using quadmath).

hi @frapac thanks for the information: tests above are in single precision only (Inria cluster), need to re-run the tests on double precision gpu's (LJAD cluster) 🤞🏾

@jbcaillau
Copy link
Member Author

Follow up here #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants