Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New "packed" random oracle input type #10005

Merged
merged 19 commits into from
Jan 21, 2022
Merged

Conversation

mrmr1993
Copy link
Member

This PR is based off PR #9935, but refactored to change only the code that relates to random oracle input. In order to compile, this includes changes that were previously scattered across PRs #9933, #9934, #9935, #9936, #9937, #9938, #9939, #9940, with some modifications to separate out the other unrelated changes that were also bundled with them.

This PR takes a slightly different strategy to #9935, by keeping the random oracle inputs (and the signatures derived from them) separated as Random_oracle.Input.Chunked and Random_oracle.Input.Legacy (equivalently Schnorr.Chunked, Schnorr.Legacy), so that it is clear at every usage which of the two input systems is used.

This also makes the requisite changes for the client SDK and for rosetta, ensuring that both continue to use the Legacy hashing/signature mode for compatibility purposes. As in the original, 'signed transactions' also use the Legacy hashing/signature mode, to ensure that various implementations do not need to be updated.


The term 'chunk' refers to a number less than the size of a field element, which we treat as an indivisible bitstring and chain together to form field elements by computing a + b * 2^n + c * 2^(n+m) + .... For example, consider

u16 // 8-bit integer
u32 // 32-bit integer
u64 // 64-bit integer
// The 'chunking' algorithm will group the following
[u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u32, u32, u32, u32, u32, u32, u32, u32, u64, u64, u64, u64]
// into the following batches
[ [u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16, u16], // 240 bits
  [u16, u32, u32, u32, u32, u32, u32, u32], // 240 bits
  [u32, u64, u64, u64], // 224 bits
  [u64] // 64 bits
]

This differs from the original implementation, which decomposed all non-field-element data into bits and recombined those bits back into field elements. This decomposition and recomposition is significantly more expensive than the extra hashing we incur by not doing so.

Checklist:

  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? List them

@mrmr1993 mrmr1993 added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Jan 11, 2022
@mrmr1993 mrmr1993 requested review from a team as code owners January 11, 2022 21:41
Copy link
Member

@bkase bkase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy around client-sdk and Rosetta looks good 👍

@mrmr1993 mrmr1993 requested a review from a team as a code owner January 12, 2022 03:38
This reverts commit 53a929f.

NOTE: fixing this programmatically from the log involves (from vim)
:tabe src/app/replayer/test/archive_db.sql
:vsplit ~/Downloads/mina_build_XXXXX_replayer-test.log

You can set up a (slow) macro to perform all of the replacements with

qq/expected_ledger_hash
2f""ayi"4f""syi"h:%s/a/s/g
lq

Then, for N the number of replacements, we run

N@q

and go grab a coffee. This *WILL* lock up your vim instance for some
time.
@jspada
Copy link
Contributor

jspada commented Jan 13, 2022

Should this one be closed now #9935

@mrmr1993
Copy link
Member Author

mrmr1993 commented Jan 13, 2022

Should this one be closed now #9935

When this is merged, the surrounding PRs should be readjusted to avoid it. There's still unmerged code in those 7 other PRs

@jspada
Copy link
Contributor

jspada commented Jan 18, 2022

Based on the description, it seems like this can have collisions.

Simple collision example, assume field element size is 8-bits to keep it short

Equation from above: a + b * 2^n + c * 2^(n+m) + ...

                   a0   b0    a1   b1
M = m4 m4 m4 m4 = [0000 0000, 0000 1000]
                = [0 + 0*2^4, 0 + 8*2^4]
                = [0, 128]

             a0    b0
N = n4 n8 = [0000, 10000000]
          = [0, 128]

I've probably missed something important.

@mrmr1993
Copy link
Member Author

Based on the description, it seems like this can have collisions.

Indeed it could, except we only use this random oracle input type when the structure of the underlying data is fixed, so it should be a non-issue. Perhaps part of the code review should be confirming that, though?

@jspada
Copy link
Contributor

jspada commented Jan 19, 2022

Based on the description, it seems like this can have collisions.

Indeed it could, except we only use this random oracle input type when the structure of the underlying data is fixed, so it should be a non-issue. Perhaps part of the code review should be confirming that, though?

Either that or making it safe no matter how it's used. I suspect the reason it's this way is for performance in the snark, so the latter may be too expensive, correct?

@mrmr1993
Copy link
Member Author

Either that or making it safe no matter how it's used. I suspect the reason it's this way is for performance in the snark, so the latter may be too expensive, correct?

The latter makes the snark more expensive, yes. We should write a variant of this called Chunked_variable_length or some such for this, but I don't think this PR is the place for it.

Copy link
Contributor

@jspada jspada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge diff!
Checked things carefully for mistakes.
Left a lot of questions and some suggestions.
Approved!

@querolita
Copy link
Member

Going back to the encoding of the input, when you say that the underlying data structure is fixed you mean that you would never be using different types of input vectors as part of the same scenario? Meaning, by the context one would be able to know what is the underlying composition of the chunk? Otherwise, one would need some bits of information to differentiate between different "recipes" leading to those chunks. For a shorter example of 32bit chunks, these could either be [u32] or [u16, u16]. When the output domain only has (in this case) 2^32 possible values, in order to avoid collisions, one can only have at most 2^32 different inputs. But given that 32bit chunks can have 2 compositions, one instead would have twice that input domain size, and thus collisions. So if the underlying structured is "assumed" and known a priori, then there seems to be no collisions.

@mrmr1993
Copy link
Member Author

Going back to the encoding of the input, when you say that the underlying data structure is fixed you mean that you would never be using different types of input vectors as part of the same scenario? Meaning, by the context one would be able to know what is the underlying composition of the chunk?

Correct, yes. We only use poseidon hashing where we need to mirror the hashing within a snark circuit; because the format of the data in the snark circuit is necessarily fixed by the permutation argument, we always know that the data will be laid out in exactly the same way.

Otherwise, one would need some bits of information to differentiate between different "recipes" leading to those chunks. For a shorter example of 32bit chunks, these could either be [u32] or [u16, u16]. When the output domain only has (in this case) 2^32 possible values, in order to avoid collisions, one can only have at most 2^32 different inputs. But given that 32bit chunks can have 2 compositions, one instead would have twice that input domain size, and thus collisions. So if the underlying structured is "assumed" and known a priori, then there seems to be no collisions.

Agreed. As the proof system becomes more flexible (and as SnarkyJS becomes more able to use that flexibility) we'll probably want to create a version of this that does the 'safe' thing, but for now I believe it isn't an issue.

@mrmr1993 mrmr1993 force-pushed the feature/new-random-oracle-input branch from dfd8c2c to e1eb5fa Compare January 21, 2022 20:20
@mrmr1993
Copy link
Member Author

After some debugging, this PR now increases the amounts in the payment test by 10x, to ensure that block rewards do not cause the balance of the timed account to fall back into the valid range. (cc @QuiteStochastic)

@mrmr1993 mrmr1993 merged commit 8f5adb8 into develop Jan 21, 2022
@mrmr1993 mrmr1993 deleted the feature/new-random-oracle-input branch January 21, 2022 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build-me Add this label to trigger a circle+buildkite build for this branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants