-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Operators use work vectors #1673
Conversation
f9d01df
to
9bb9298
Compare
@@ -862,7 +862,7 @@ int CeedVectorPointwiseMult(CeedVector w, CeedVector x, CeedVector y) { | |||
CeedCall(CeedVectorGetLength(w, &length_w)); | |||
CeedCall(CeedVectorGetLength(x, &length_x)); | |||
CeedCall(CeedVectorGetLength(y, &length_y)); | |||
CeedCheck(length_w == length_x && length_w == length_y, ceed, CEED_ERROR_UNSUPPORTED, | |||
CeedCheck(length_x >= length_x && length_y >= length_w, ceed, CEED_ERROR_UNSUPPORTED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to allow usage of longer cached vectors than strictly needed
@@ -331,11 +331,6 @@ static int CeedBasisApplyAtPointsCheckDims(CeedBasis basis, CeedInt num_elem, co | |||
if (x_ref != CEED_VECTOR_NONE) CeedCall(CeedVectorGetLength(x_ref, &x_length)); | |||
if (u != CEED_VECTOR_NONE) CeedCall(CeedVectorGetLength(u, &u_length)); | |||
|
|||
// Check compatibility of topological and geometrical dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needed to allow use of cache variables that are longer than strictly required
Blocks #1646 |
5ba7b9b
to
1945a8d
Compare
Branch passes Ratel CI too $ make prove -j CEED_BACKENDS=/gpu/cuda/shared
-----------------------------------------
| ____ __ __ |
| / __ \ ____ _ / /_ ___ / / |
| / /_/ / / __ `/ / __/ / _ \ / / |
| / _, _/ / /_/ / / /_ / __/ / / |
| /_/ |_| \__,_/ \__/ \___/ /_/ |
-----------------------------------------
-----------------------------------------
Dependencies:
CEED_DIR = /home/jeremy/Dev/libCEED
PETSC_DIR = /home/jeremy/Dev/petsc
PETSC_ARCH = arch-cuda-mpich
Optional Dependencies:
ENZYME_LIB = (not found)
-----------------------------------------
Running unit tests
- Testing with libCEED backends: /gpu/cuda/shared
- Testing on 1 processes
prove -j 16 --exec 'python3 tests/junit.py --petsc-arch arch-cuda-mpich --ceed-backends /gpu/cuda/shared --mode tap --nproc 1 --pool-size 1' t000-init t001-view t002-view t003-ts-monitor t004-ts-checkpoint t010-eigensolver t050-mpm t100-static-elasticity t101-static-elasticity t102-static-elasticity t103-static-elasticity t110-static-elasticity t111-static-elasticity t120-static-elasticity t121-static-elasticity t122-static-elasticity t123-static-elasticity t211-quasistatic-elasticity t221-quasistatic-elasticity t222-quasistatic-elasticity ex01-static ex02-quasistatic ex03-dynamic
t050-mpm ..................... ok
t010-eigensolver ............. ok
t000-init .................... ok
t001-view .................... ok
t002-view .................... ok
t003-ts-monitor .............. ok
t101-static-elasticity ....... ok
t122-static-elasticity ....... ok
t123-static-elasticity ....... ok
t100-static-elasticity ....... ok
t111-static-elasticity ....... ok
t211-quasistatic-elasticity .. ok
t103-static-elasticity ....... ok
t222-quasistatic-elasticity .. ok
t110-static-elasticity ....... ok
t120-static-elasticity ....... ok
t102-static-elasticity ....... ok
t221-quasistatic-elasticity .. ok
t121-static-elasticity ....... ok
ex03-dynamic ................. ok
t004-ts-checkpoint ........... ok
ex01-static .................. ok
ex02-quasistatic ............. ok
All tests successful.
Files=23, Tests=197, 482 wallclock secs ( 0.12 usr 0.04 sys + 1145.66 cusr 120.38 csys = 1266.20 CPU)
Result: PASS |
Note to me for follow-up: Make an issue about adding this code cleanup and simplification to the CPU side of the house |
f1d6ed5
to
96093a6
Compare
This takes lessons from the Gen backends and processes the fields in the best order to prevent duplicate work while allowing for reuse of buffers, specifically the new cached vector buffer.