Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Testnet Deploy Branch #9788

Closed
wants to merge 56 commits into from
Closed

Conversation

ludamad
Copy link
Collaborator

@ludamad ludamad commented Nov 6, 2024

DO NOT MERGE
Once complete this may be merged, but the ideal is to still have changes hit master first, and this to be only cherry-picks and patches that truly have to diverge.

For testnet deploy 1 milestone.
We aim to get stable:

  • 48+ validators
  • proving at 0.1 TPS with prover coordination
  • reorgs

Testing plan:

Metrics plan:

We will not be merging:

  • blobs

@ludamad ludamad added the S-do-not-merge Status: Do not merge this PR label Nov 6, 2024
@ludamad ludamad changed the title WIP: Deploy 1 Dev Branch feat: Deploy 1 Dev Branch Nov 6, 2024
just-mitch and others added 3 commits November 6, 2024 22:36
The sequencer now has explicit timeliness requirements for itself with
respect to block building.

We also log a metric for the amount of time left (or overconsumed) when
trying to transition from one state to the next.

Also fix a bug where we could permanently fail to build any blocks if
our publish failed.

Gate the strictness behind a flag `SEQ_ENFORCE_TIME_TABLE`. Enable the flag by default in our k8s deployments.
@ludamad ludamad changed the title feat: Deploy 1 Dev Branch feat: Tesnet Deploy Branch Nov 7, 2024
Makes the proof size of ECCVM constant by making the sumcheck gate
challenges and IPA constant.
Fixes the ECCVM recursive verifier size (besides the MSM in the IPA
Recursive verifier) as a result.

Closes AztecProtocol/barretenberg#1009.
@ludamad ludamad changed the title feat: Tesnet Deploy Branch feat: Testnet Deploy Branch Nov 7, 2024
AztecBot and others added 19 commits November 7, 2024 02:24
subrepo:
  subdir:   "barretenberg"
  merged:   "e049abf9a3"
upstream:
  origin:   "https://github.com/AztecProtocol/barretenberg"
  branch:   "master"
  commit:   "e049abf9a3"
git-subrepo:
  version:  "0.4.6"
  origin:   "???"
  commit:   "???" [skip ci]
subrepo:
  subdir:   "noir-projects/aztec-nr"
  merged:   "51ad865ec6"
upstream:
  origin:   "https://github.com/AztecProtocol/aztec-nr"
  branch:   "master"
  commit:   "51ad865ec6"
git-subrepo:
  version:  "0.4.6"
  origin:   "???"
  commit:   "???" [skip ci]
Closes: #9371
#9370
#9372

Blindly moves the logic in `note_processor.ts` to the
`simulator_oracle`, so retrieved logs can be processed and injected in
PXE's db. This approach has several problems:

- We have to trigger a process from an oracle, but the thing itself
requires a simulator to call `compute_note_hash_and_nullifier`. This
either implies moving lot of stuff into `client_execution_context`
(namely, the key store), or a chicken and egg problem with
`simulator_oracle` (we need a simulator in the oracle that is provided
to a simulator). Right now a very ugly solution is adopted,
instantiating a new simulator in place.
- ~~Are deferred notes even necessary now?~~ No, will be removed in
#9575
- Is there an alternative to passing the the `dataStartIndexForTx` all
the way from the node to compute the note index? (this is not too bad
IMO)
- Can the logic in the processor be simplified?
- Is there a better place to put this code that still allows us to
initiate the process from `aztec.nr`?

---------

Co-authored-by: Nicolás Venturo <[email protected]>
Fixes #8328 by using boring names. 

Ends up with a lot of tiny changes in the form of paths and variable
names etc. Some of the environment variable names are changed.



![The Big Lebowski Goodnight Sweet Prince
GIF](https://media1.giphy.com/media/jtUiNaYnqZZlu/giphy.gif)

---------

Co-authored-by: just-mitch <[email protected]>
…9726)

As a part of ZK-fication of Honk, we have to mask the evaluations of
round univariates that the prover sends to the verifier. The evaluations
were masked in Sumcheck in PR #7517. However, the logic for proving
evaluations of Libra masking polynomials was missing. This PR fixes this
issue and enables efficient batch opening of these polynomials.
* Added necessary logic to Shplonk Prover, Shplemini Prover, and
Shplemini Verifer
* Better handling of the ZKSumcheckData
* Removed methods and reverted changes that became obsolete because of
the new ZK strategy
* Enabled the opening of Libra masking univariates in ECCVM and
Translator
…raits.rs (#9406)

- Improved the input validation logic for the 'clean' command in
`bootstrap.sh` to accept various forms of confirmation such as 'y', 'Y',
'yes', and 'YES'.
- Refactored `bit_traits.rs` to reduce code duplication and improve
efficiency:
  - Utilized `leading_zeros()` to optimize the `get_msb` function.
- Implemented a generalized `BitsQueryable` trait for numeric types,
reducing the need for multiple implementations.
- Maintained individual implementations for `FieldElement` and
`MemoryAddress` to prevent type-related issues.

---------

Co-authored-by: Facundo <[email protected]>
Seeing how these fair with the different test setup, since the gerousia
one has been running for a few days

Changes:
- e2e_p2p was missing from p2p config, so even though above states
gerousia was enabled a few days ago, it never actually ran
- each e2e_p2p test gets their own runner
- labels will now match a prefix in the e2e test config file, rather
than being exact, this allows the existing e2e-p2p label to cover all of
the new tests
Resolves #9592
 - Now contract artifacts must have VKs in their private functions
- aztec-nargo inserts the verification keys after public function
transpilation
 - We no longer derive any VK in the TX proving flow
 - App VKs are now constrained in the private kernels
 - Bootstrap generates VKs for all apps (with s3 caching)
- PXE is currently accepting any VK present in the artifact as valid: we
should explore the correct interface for this in the future and wether
PXE can use those VKs without rederiving them from ACIR
This PR:
1. Adds Origin Tags for tracking dangerous interactions to all stdlib
memory primitives
2. Expands  the tests from TwinRomTable
3. Fixes a bug with the use of nonnormalized value.
…9710)

Get access to all of the p2p metrics counts to help investigate issues

fixes: #9691
Makes metrics collection time configurable, + reduce collection time in
e2e tests with metrics collection enabled
Adds a doc with info about how to enable client side proving in the
sandbox.

closes AztecProtocol/dev-rel#442

---------

Co-authored-by: saleel <[email protected]>
stevenplatt and others added 2 commits November 7, 2024 22:39
# Change Log

- Change boot node service to headless for pod-level dns

Currently pod dns names do not resolve in Google Kubernetes Engine, even
though they resolve in EKS (a difference of cluster-level DNS
implementation). Setting the boot node service to headless prevents it
from being assigned and IP and instead inserts the required pod DNS
entries. This change has been tested both in AWS and GCloud.
Maddiaa0 and others added 15 commits November 8, 2024 18:53
We achieve ZK in Shplemini as follows. Before batching the multilinear
evaluation claims obtained as the sumcheck output, the Gemini prover
* creates a random polynomial M of the circuit size;
* commits to M using KZG/IPA, sends the commitment to the verifier;
* evaluates M at the sumcheck challenge, sends the evaluation to the
verifier.

The verifier simply adds this new commitment and the appropriate scalar
multiplier to the BatchOpeningClaim.
We can print `std::chrono` durations our usual PR test builds but not on
mac. Should fix the mac build if this is the only issue.
…9862)

These test files are used outside this module. The rebuild pattern has
been made more conservative.
The L1-L2 message tree height was a bottleneck running 
```
post-mortem of 1-validator network (bot set to 0.05 TPS, 1 private / 2 public transfers per tx)
Lasted long, got to block 4091, last tried to propose block 4097
Hit issues and did not reorg past them
Root issue (guess):
2024-11-05 08:32:53.148	Error assembling block: 'Error: Failed to append leaves: Tree is full'
```
Also updated tree heights with constants proposed by @iAmMichaelConnor
here (#9451)
(thanks for the thoughtful analysis I could lazily steal!
Automated test is a bit awkward here or I'd write one. It'd either
trivially pass or have to go through 3-days worth of transactions.
subrepo:
  subdir:   "barretenberg"
  merged:   "1a334c80aa"
upstream:
  origin:   "https://github.com/AztecProtocol/barretenberg"
  branch:   "master"
  commit:   "1a334c80aa"
git-subrepo:
  version:  "0.4.6"
  origin:   "???"
  commit:   "???" [skip ci]
subrepo:
  subdir:   "noir-projects/aztec-nr"
  merged:   "530fb20e9d"
upstream:
  origin:   "https://github.com/AztecProtocol/aztec-nr"
  branch:   "master"
  commit:   "530fb20e9d"
git-subrepo:
  version:  "0.4.6"
  origin:   "???"
  commit:   "???" [skip ci]
De-enshrines the following constants and turns them into config:

```
ETHEREUM_SLOT_DURATION = 12;
AZTEC_SLOT_DURATION = 24;
AZTEC_EPOCH_DURATION = 16;
AZTEC_TARGET_COMMITTEE_SIZE = 48;
AZTEC_EPOCH_PROOF_CLAIM_WINDOW_IN_L2_SLOTS = 13;
```

These can now be set via env vars. On L1, they are set as immutable
variables across all contracts that require them. As for circuits, none
of them was needed, except for the epoch duration to be able to
dimension the fees array. This was handled by introducing a new
MAX_EPOCH_DURATION constant (32) which sets the max length of the array.

This is a prerequisite to #9809
Reverts changes to the file made in
90696cd.
Also removed the way they were using `npx aztec-app` to command the
sandbox, they've been removed from that tool anyway
**Adds schemas for every API.** Every API exposed via JSON RPC now
requires a zod schema (see #9656 for more context on the rationale for
this change). All schemas are in `circuit-types/interfaces`, and look
like:


https://github.com/AztecProtocol/aztec-packages/blob/3e78ec721285fcd533cff61329a8e156958e2d65/yarn-project/circuit-types/src/interfaces/prover-node.ts#L33-L45

These schemas are type-checked against the interface via the
`ApiSchemaFor` utility type, so if the interface changes, schemas are
required by the compiler to change as well. Schemas are now used in the
JSON RPC server to 1) identify which methods are exposed (so we no
longer need the method disallowlist) and 2) parse their arguments. The
JSON RPC server, once it has identified the method to be called, grabs
the arguments schema and funnels the result of a vanilla JSON parse
through it.

Every type or struct that is exposed via an interface now has an
associated schema, which is referenced in the API for parsing. Schemas
both validate input and hydrate instances. This means that we no longer
set a `type` property to identify how to hydrate each object in a
request during deserialization, which was a security risk.


https://github.com/AztecProtocol/aztec-packages/blob/3e78ec721285fcd533cff61329a8e156958e2d65/yarn-project/circuit-types/src/l2_block.ts#L24-L32

Schemas are also used in the JSON RPC client for deserializing the
result types. Again, this lets us remove the `type` parameter from all
serialized entities, though this is still present in since it is
required by the `TypeRegistry` (still to be removed) which is only used
in the snapshot manager.

All schemas are tested via mini integration tests. These tests define a
mock implementation for each service, use it for setting up a JSON RPC
server, starting it in a free port, and test calling every method
through JSON RPC.


https://github.com/AztecProtocol/aztec-packages/blob/3e78ec721285fcd533cff61329a8e156958e2d65/yarn-project/circuit-types/src/interfaces/prover-node.test.ts#L12-L31

These changes prompted other changes. For instance, we introduced the
following changes to APIs:

- `ProvingJobSource.rejectProvingJob` now accepts a reason `string`
instead of an `Error` type
- `PXE.getEvents(type)` is removed in favor of `PXE.getEncryptedEvents`
and `PXE.getUnencryptedEvents` since both methods required different
arguments

We also removed service-management methods (ie `stop`) from interfaces.
We were inadvertently calling `stop` on remote instances over http when
we shouldn't have. We also typed some previously untyped interfaces,
such as the TXE's.

Fixes #9455
This was sometimes erroring out on the first attempt while awaiting the
transaction to settle,
but then subsequent calls were attempting to reinitialize.

Added better logging to help clarify. 

Also adjust the validator url in the template to be the top level
service, to better distribute transactions across the network and not
rely so heavily on the boot node for gossiping.

---------

Co-authored-by: ludamad <[email protected]>
@ludamad ludamad added the e2e-all CI: Enables this CI job. label Nov 11, 2024
@ludamad
Copy link
Collaborator Author

ludamad commented Nov 11, 2024

Cherry-picked a batch:

  • bb prereqs for this commit and this commit 9bc5a2f, ~halves tube circuit
  • relevant fixes from tmnt and alpha team
  • fix for tree constants limiting block production to 3 days

@ludamad ludamad changed the base branch from master to release-base-ci-hack November 11, 2024 07:53
@ludamad ludamad added the network-all Run this CI job. label Nov 11, 2024
@ludamad
Copy link
Collaborator Author

ludamad commented Nov 11, 2024

Given issues with tests, just force-pushed master onto this branch

@ludamad ludamad removed the request for review from Maddiaa0 November 11, 2024 10:36
@ludamad ludamad closed this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e-all CI: Enables this CI job. network-all Run this CI job. S-do-not-merge Status: Do not merge this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.