Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VID] batch commit with GPU is unexpectedly slow #526

Open
mrain opened this issue Mar 19, 2024 · 2 comments
Open

[VID] batch commit with GPU is unexpectedly slow #526

mrain opened this issue Mar 19, 2024 · 2 comments
Labels
cappuccino optimize-vid https://www.notion.so/espressosys/9d835f79d4504926b8b3bb3d015abf06?v=b7028cdaea804b7aa918af95c0cd651 vid

Comments

@mrain
Copy link
Contributor

mrain commented Mar 19, 2024

Check this branch https://github.com/EspressoSystems/jellyfish/tree/cl/gpu-profiling
Running with cargo test --features gpu-vid,kzg-print-trace,print-trace -p jf-primitives -- profile_gpu_commit --nocapture gives you the following result. You can see the performance degrading with increased batch size.
However according to cargo bench --bench kzg-gpu --features "test-srs icicle", MSM should only cost you [28.107 ms 28.438 ms 28.988 ms]

Start:   KZG10::Setup with prover degree 1048576 and verifier degree 1
··Start:   Generating powers of G
··End:     Generating powers of G ..................................................8.384s
End:     KZG10::Setup with prover degree 1048576 and verifier degree 1 .............8.769s
Start:   Type Conversion: ark->ICICLE: Group
End:     Type Conversion: ark->ICICLE: Group .......................................9.590ms
Start:   Load group elements: CPU->GPU
End:     Load group elements: CPU->GPU .............................................7.521ms
Start:   Batch commit 1048576 total elements, batch size 1
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................23.156ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.502ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................22.853ms
··Start:   Sync MSM result
··End:     Sync MSM result .........................................................11.730ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................52.750µs
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................182.968µs
End:     Batch commit 1048576 total elements, batch size 1 .........................61.846ms
Start:   Batch commit 1048576 total elements, batch size 8
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................24.932ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.627ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................22.863ms
··Start:   Sync MSM result
··End:     Sync MSM result .........................................................27.982ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................49.570µs
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................682.948µs
End:     Batch commit 1048576 total elements, batch size 8 .........................80.681ms
Start:   Batch commit 1048576 total elements, batch size 16
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................22.681ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................5.194ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................98.494ms
··Start:   Sync MSM result
··End:     Sync MSM result .........................................................49.478ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................109.749µs
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................865.120µs
End:     Batch commit 1048576 total elements, batch size 16 ........................178.481ms
Start:   Batch commit 1048576 total elements, batch size 256
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................23.140ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................10.269ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................180.192ms
··Start:   Sync MSM result
··End:     Sync MSM result .........................................................61.028ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................260.128µs
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................3.137ms
End:     Batch commit 1048576 total elements, batch size 256 .......................279.902ms
Start:   Batch commit 1048576 total elements, batch size 1024
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................24.463ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.960ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................60.377ms
··Start:   Sync MSM result
··End:     Sync MSM result .........................................................64.456ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................159.259µs
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................12.733ms
End:     Batch commit 1048576 total elements, batch size 1024 ......................167.020ms
Start:   Batch commit 1048576 total elements, batch size 4096
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................23.588ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................5.343ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................198.382ms
··Start:   Sync MSM result
··End:     Sync MSM result .........................................................39.863ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................200.608µs
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................41.532ms
End:     Batch commit 1048576 total elements, batch size 4096 ......................311.326ms
test pcs::univariate_kzg::tests::icicle::profile_gpu_commit ... ok
@mrain mrain added vid cappuccino optimize-vid https://www.notion.so/espressosys/9d835f79d4504926b8b3bb3d015abf06?v=b7028cdaea804b7aa918af95c0cd651 labels Mar 19, 2024
@mrain
Copy link
Contributor Author

mrain commented Mar 19, 2024

The other weird observation is, if you do not warmup a new cuda stream per run, the performance is also not good even with 1 batch.
cc @alxiong

@mrain
Copy link
Contributor Author

mrain commented Mar 19, 2024

Also with one single batch, the type conversion takes one-third of the time. We could eliminate that #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cappuccino optimize-vid https://www.notion.so/espressosys/9d835f79d4504926b8b3bb3d015abf06?v=b7028cdaea804b7aa918af95c0cd651 vid
Projects
None yet
Development

No branches or pull requests

1 participant