Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: overhead free type conversion for load_poly_to_gpu #516

Merged
merged 10 commits into from
Apr 1, 2024
Merged

Conversation

mrain
Copy link
Contributor

@mrain mrain commented Mar 14, 2024

Description

Update: this also fixes the API changes when gpu_vid is enabled.

This PR avoids a memory copy in load_poly_to_gpu. However it contains unsafe rust code.

Before:

Start:   KZG10::Setup with prover degree 4194304 and verifier degree 1
...
Start:   Type Conversion: ark->ICICLE: Scalar
End:     Type Conversion: ark->ICICLE: Scalar ......................................17.305ms
...

After:

Start:   KZG10::Setup with prover degree 4194304 and verifier degree 1
...
Start:   Type Conversion: ark->ICICLE: Scalar
End:     Type Conversion: ark->ICICLE: Scalar ......................................160ns
...

It doesn't apply to affine point conversion because of two different underlying struct reprs.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to GitHub issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the GitHub PR explorer

@alxiong
Copy link
Contributor

alxiong commented Mar 14, 2024

This is pretty impressive speed-up, but you pushed the instance to be 2^22 to make the gap meaningful, for much smaller instances (which is our case), the gap is so small, (correct?)

  • I have no objection of using this unsafe, since it's very limited scope, and easy to reason about. As long as we add some tests to constantly watching for the "same underlying repr" assumption is correct, I'm comfortable. (ICICLE is using tones of unsafe anyway.)
  • I vote we turn this into a draft PR and archive it (instead of merging it). and use it in the future when the instance size become so large that conversion time is a bottleneck.

wdyt? @chancharles92 @mrain

@mrain mrain requested review from alxiong and ggutoski April 1, 2024 19:03
@@ -1401,6 +1407,26 @@ mod tests {
test_gpu_e2e_template::<Bn254>().unwrap();
}

fn test_gpu_ark_conversion_template<E: Pairing>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this test. But can we do more? Earlier code comment says:

We assume that two types use the same underline repr.

In that case, we should be able to assert that the bytes are exactly the same. That way, if an incompatibility is introduced then the test will detect it immediately, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit annoying but doable: c82e230

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I don't fully understand this new test but here's what I see:

  1. make a slice of random new ark scalars in scalars
  2. convert them into a IC scalars slice ic_scalars via from_ark.
  3. copy the slice into a new GPU slice called d_scalars.
  4. transform the GPU slice to Montgomery form on the GPU
  5. copy the GPU scalars out of GPU and back into ic_scalars
  6. these ic_scalars (in Montgomery form) should be byte-for-byte identical to the original scalars. Check this via align_to

Not sure why you need to copy to/from GPU but AFIACT this test does what it's supposed to.

@ggutoski
Copy link
Contributor

ggutoski commented Apr 1, 2024

  • I have no objection of using this unsafe, since it's very limited scope, and easy to reason about. As long as we add some tests to constantly watching for the "same underlying repr" assumption is correct, I'm comfortable. (ICICLE is using tones of unsafe anyway.)

We could add a test to enforce this, no? See #516 (comment)

  • I vote we turn this into a draft PR and archive it (instead of merging it). and use it in the future when the instance size become so large that conversion time is a bottleneck.

I think I disagree. An archived PR will experience bit rot and get forgotten. Is your concern only about the unsafe code? Perhaps that concern could be mitigated with a stricter test described above.

@mrain mrain requested a review from ggutoski April 1, 2024 20:46
@mrain mrain merged commit 4a2e7b7 into main Apr 1, 2024
5 checks passed
@mrain mrain deleted the cl/unsafe branch April 1, 2024 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants