Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Operators use work vectors #1673

Merged
merged 5 commits into from
Oct 2, 2024
Merged

GPU Operators use work vectors #1673

merged 5 commits into from
Oct 2, 2024

Conversation

jeremylt
Copy link
Member

@jeremylt jeremylt commented Sep 24, 2024

This takes lessons from the Gen backends and processes the fields in the best order to prevent duplicate work while allowing for reuse of buffers, specifically the new cached vector buffer.

  • Refactor to expose place to use work vectors
  • Reorder processing of inputs/outputs to simplify
  • Add work vectors

@@ -862,7 +862,7 @@ int CeedVectorPointwiseMult(CeedVector w, CeedVector x, CeedVector y) {
CeedCall(CeedVectorGetLength(w, &length_w));
CeedCall(CeedVectorGetLength(x, &length_x));
CeedCall(CeedVectorGetLength(y, &length_y));
CeedCheck(length_w == length_x && length_w == length_y, ceed, CEED_ERROR_UNSUPPORTED,
CeedCheck(length_x >= length_x && length_y >= length_w, ceed, CEED_ERROR_UNSUPPORTED,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to allow usage of longer cached vectors than strictly needed

@@ -331,11 +331,6 @@ static int CeedBasisApplyAtPointsCheckDims(CeedBasis basis, CeedInt num_elem, co
if (x_ref != CEED_VECTOR_NONE) CeedCall(CeedVectorGetLength(x_ref, &x_length));
if (u != CEED_VECTOR_NONE) CeedCall(CeedVectorGetLength(u, &u_length));

// Check compatibility of topological and geometrical dimensions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needed to allow use of cache variables that are longer than strictly required

@jeremylt
Copy link
Member Author

Blocks #1646

@jeremylt jeremylt force-pushed the jeremy/use-work-vecs branch 2 times, most recently from 5ba7b9b to 1945a8d Compare September 26, 2024 18:14
@jeremylt
Copy link
Member Author

jeremylt commented Sep 26, 2024

Branch passes Ratel CI too

$ make prove -j CEED_BACKENDS=/gpu/cuda/shared
-----------------------------------------
|      ____            __           __  |
|     / __ \  ____ _  / /_  ___    / /  |
|    / /_/ / / __ `/ / __/ / _ \  / /   |
|   / _, _/ / /_/ / / /_  /  __/ / /    |
|  /_/ |_|  \__,_/  \__/  \___/ /_/     |
-----------------------------------------

-----------------------------------------

Dependencies:
CEED_DIR      = /home/jeremy/Dev/libCEED
PETSC_DIR     = /home/jeremy/Dev/petsc
PETSC_ARCH    = arch-cuda-mpich

Optional Dependencies:
ENZYME_LIB     = (not found)

-----------------------------------------

Running unit tests
- Testing with libCEED backends: /gpu/cuda/shared
- Testing on 1 processes
prove -j 16 --exec 'python3 tests/junit.py --petsc-arch arch-cuda-mpich --ceed-backends /gpu/cuda/shared --mode tap --nproc 1 --pool-size 1' t000-init t001-view t002-view t003-ts-monitor t004-ts-checkpoint t010-eigensolver t050-mpm t100-static-elasticity t101-static-elasticity t102-static-elasticity t103-static-elasticity t110-static-elasticity t111-static-elasticity t120-static-elasticity t121-static-elasticity t122-static-elasticity t123-static-elasticity t211-quasistatic-elasticity t221-quasistatic-elasticity t222-quasistatic-elasticity ex01-static ex02-quasistatic ex03-dynamic
t050-mpm ..................... ok                                       
t010-eigensolver ............. ok                                       
t000-init .................... ok                                       
t001-view .................... ok                                       
t002-view .................... ok                                       
t003-ts-monitor .............. ok                                       
t101-static-elasticity ....... ok                                       
t122-static-elasticity ....... ok                                       
t123-static-elasticity ....... ok                                       
t100-static-elasticity ....... ok                                       
t111-static-elasticity ....... ok                                       
t211-quasistatic-elasticity .. ok                                       
t103-static-elasticity ....... ok                                       
t222-quasistatic-elasticity .. ok                                       
t110-static-elasticity ....... ok                                       
t120-static-elasticity ....... ok                                       
t102-static-elasticity ....... ok                                       
t221-quasistatic-elasticity .. ok                                       
t121-static-elasticity ....... ok                                       
ex03-dynamic ................. ok                                       
t004-ts-checkpoint ........... ok                                       
ex01-static .................. ok                                       
ex02-quasistatic ............. ok     
All tests successful.
Files=23, Tests=197, 482 wallclock secs ( 0.12 usr  0.04 sys + 1145.66 cusr 120.38 csys = 1266.20 CPU)
Result: PASS

@jeremylt
Copy link
Member Author

Note to me for follow-up: Make an issue about adding this code cleanup and simplification to the CPU side of the house

@jeremylt jeremylt self-assigned this Sep 27, 2024
@jeremylt jeremylt force-pushed the jeremy/use-work-vecs branch from f1d6ed5 to 96093a6 Compare September 27, 2024 22:28
@jeremylt jeremylt merged commit bdd4742 into main Oct 2, 2024
27 of 28 checks passed
@jeremylt jeremylt deleted the jeremy/use-work-vecs branch October 2, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant