GPU Operators use work vectors #1673

jeremylt · 2024-09-24T20:52:41Z

This takes lessons from the Gen backends and processes the fields in the best order to prevent duplicate work while allowing for reuse of buffers, specifically the new cached vector buffer.

Refactor to expose place to use work vectors
Reorder processing of inputs/outputs to simplify
Add work vectors

jeremylt · 2024-09-26T17:38:01Z

interface/ceed-vector.c

@@ -862,7 +862,7 @@ int CeedVectorPointwiseMult(CeedVector w, CeedVector x, CeedVector y) {
  CeedCall(CeedVectorGetLength(w, &length_w));
  CeedCall(CeedVectorGetLength(x, &length_x));
  CeedCall(CeedVectorGetLength(y, &length_y));
-  CeedCheck(length_w == length_x && length_w == length_y, ceed, CEED_ERROR_UNSUPPORTED,
+  CeedCheck(length_x >= length_x && length_y >= length_w, ceed, CEED_ERROR_UNSUPPORTED,


Needed to allow usage of longer cached vectors than strictly needed

jeremylt · 2024-09-26T17:38:34Z

interface/ceed-basis.c

@@ -331,11 +331,6 @@ static int CeedBasisApplyAtPointsCheckDims(CeedBasis basis, CeedInt num_elem, co
  if (x_ref != CEED_VECTOR_NONE) CeedCall(CeedVectorGetLength(x_ref, &x_length));
  if (u != CEED_VECTOR_NONE) CeedCall(CeedVectorGetLength(u, &u_length));

-  // Check compatibility of topological and geometrical dimensions


Also needed to allow use of cache variables that are longer than strictly required

jeremylt · 2024-09-26T17:42:50Z

Blocks #1646

jeremylt · 2024-09-26T18:45:17Z

Branch passes Ratel CI too

$ make prove -j CEED_BACKENDS=/gpu/cuda/shared
-----------------------------------------
|      ____            __           __  |
|     / __ \  ____ _  / /_  ___    / /  |
|    / /_/ / / __ `/ / __/ / _ \  / /   |
|   / _, _/ / /_/ / / /_  /  __/ / /    |
|  /_/ |_|  \__,_/  \__/  \___/ /_/     |
-----------------------------------------

-----------------------------------------

Dependencies:
CEED_DIR      = /home/jeremy/Dev/libCEED
PETSC_DIR     = /home/jeremy/Dev/petsc
PETSC_ARCH    = arch-cuda-mpich

Optional Dependencies:
ENZYME_LIB     = (not found)

-----------------------------------------

Running unit tests
- Testing with libCEED backends: /gpu/cuda/shared
- Testing on 1 processes
prove -j 16 --exec 'python3 tests/junit.py --petsc-arch arch-cuda-mpich --ceed-backends /gpu/cuda/shared --mode tap --nproc 1 --pool-size 1' t000-init t001-view t002-view t003-ts-monitor t004-ts-checkpoint t010-eigensolver t050-mpm t100-static-elasticity t101-static-elasticity t102-static-elasticity t103-static-elasticity t110-static-elasticity t111-static-elasticity t120-static-elasticity t121-static-elasticity t122-static-elasticity t123-static-elasticity t211-quasistatic-elasticity t221-quasistatic-elasticity t222-quasistatic-elasticity ex01-static ex02-quasistatic ex03-dynamic
t050-mpm ..................... ok                                       
t010-eigensolver ............. ok                                       
t000-init .................... ok                                       
t001-view .................... ok                                       
t002-view .................... ok                                       
t003-ts-monitor .............. ok                                       
t101-static-elasticity ....... ok                                       
t122-static-elasticity ....... ok                                       
t123-static-elasticity ....... ok                                       
t100-static-elasticity ....... ok                                       
t111-static-elasticity ....... ok                                       
t211-quasistatic-elasticity .. ok                                       
t103-static-elasticity ....... ok                                       
t222-quasistatic-elasticity .. ok                                       
t110-static-elasticity ....... ok                                       
t120-static-elasticity ....... ok                                       
t102-static-elasticity ....... ok                                       
t221-quasistatic-elasticity .. ok                                       
t121-static-elasticity ....... ok                                       
ex03-dynamic ................. ok                                       
t004-ts-checkpoint ........... ok                                       
ex01-static .................. ok                                       
ex02-quasistatic ............. ok     
All tests successful.
Files=23, Tests=197, 482 wallclock secs ( 0.12 usr  0.04 sys + 1145.66 cusr 120.38 csys = 1266.20 CPU)
Result: PASS

jeremylt · 2024-09-27T15:38:43Z

Note to me for follow-up: Make an issue about adding this code cleanup and simplification to the CPU side of the house

jeremylt added enhancement GPU performance 0-WIP labels Sep 24, 2024

jeremylt force-pushed the jeremy/use-work-vecs branch 5 times, most recently from f9d01df to 9bb9298 Compare September 26, 2024 17:34

jeremylt commented Sep 26, 2024

View reviewed changes

jeremylt added 1-In Review and removed 0-WIP labels Sep 26, 2024

jeremylt mentioned this pull request Sep 26, 2024

Must Destroy OperatorField Objects #1646

Merged

jeremylt force-pushed the jeremy/use-work-vecs branch 2 times, most recently from 5ba7b9b to 1945a8d Compare September 26, 2024 18:14

jeremylt mentioned this pull request Sep 27, 2024

CPU Operator Cleanup #1675

Open

jeremylt self-assigned this Sep 27, 2024

jeremylt added 5 commits September 27, 2024 16:27

gpu - refactor ref operator

43e13fe

gpu - further ref refactoring

034f99f

gpu - use cached work vectors across operators

8bbba8c

gpu - only overwite portion of basis target used

19a04db

minor - bump max it clip on bps slightly

96093a6

jeremylt force-pushed the jeremy/use-work-vecs branch from f1d6ed5 to 96093a6 Compare September 27, 2024 22:28

jeremylt merged commit bdd4742 into main Oct 2, 2024
27 of 28 checks passed

jeremylt deleted the jeremy/use-work-vecs branch October 2, 2024 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Operators use work vectors #1673

GPU Operators use work vectors #1673

jeremylt commented Sep 24, 2024 •

edited

Loading

jeremylt Sep 26, 2024

jeremylt Sep 26, 2024

jeremylt commented Sep 26, 2024

jeremylt commented Sep 26, 2024 •

edited

Loading

jeremylt commented Sep 27, 2024

GPU Operators use work vectors #1673

GPU Operators use work vectors #1673

Conversation

jeremylt commented Sep 24, 2024 • edited Loading

jeremylt Sep 26, 2024

Choose a reason for hiding this comment

jeremylt Sep 26, 2024

Choose a reason for hiding this comment

jeremylt commented Sep 26, 2024

jeremylt commented Sep 26, 2024 • edited Loading

jeremylt commented Sep 27, 2024

jeremylt commented Sep 24, 2024 •

edited

Loading

jeremylt commented Sep 26, 2024 •

edited

Loading