-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: async shuffling refactor #6938
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #6938 +/- ##
============================================
+ Coverage 49.35% 50.84% +1.49%
============================================
Files 592 597 +5
Lines 39293 39835 +542
Branches 2248 2058 -190
============================================
+ Hits 19392 20254 +862
+ Misses 19860 19581 -279
+ Partials 41 0 -41 |
Performance Report✔️ no performance regression detected Full benchmark results
|
This reverts commit 104aa56.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome, I think we should wait for v1.22 to be released first before merging this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why we use both upcoming and next epoch terminology?
Yes. The slot gets incremented between |
Can you commit this somewhere in the code as a jsdoc comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - great work!
// in this context but that is not actually true because the state transition happens in the last 4 seconds of the | ||
// epoch. For the context of this function "upcoming epoch" is used to denote the epoch that will begin after this | ||
// function returns. The epoch that is "next" once the state transition is complete is referred to as the | ||
// epochAfterUpcoming for the same reason to help minimize confusion. | ||
const upcomingEpoch = this.nextEpoch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I really like the name "upcomingEpoch" in this context 👍
* feat: add ShufflingCache to EpochCache * fix: implementation in state-transition for EpochCache with ShufflingCache * feat: remove shufflingCache.processState * feat: implement ShufflingCache changes in beacon-node * feat: pass shufflingCache when loading cached state from db * test: fix state-transition tests for EpochCache changes * feat: Pass shufflingCache to EpochCache at startup * test: fix slot off by one for decision root in perf test * chore: use ?. syntax * chore: refactoring * feat: add comments and clean up afterProcessEpoch * fix: perf test slot incrementing * fix: remove MockShufflingCache * Revert "chore: refactoring" This reverts commit 104aa56. * refactor: shufflingCache getters * refactor: shufflingCache setters * refactor: build and getOrBuild * docs: add comments to ShufflingCache methods * chore: lint issues * test: update tests in beacon-node * chore: lint * feat: get shufflings from cache for API * feat: minTimeDelayToBuildShuffling cli flag * test: fix shufflingCache promise insertion test * fix: rebase conflicts * fix: changes from debugging sim tests * refactor: minimize changes in afterProcessEpoch * chore: fix lint * chore: fix check-types * chore: fix check-types * feat: add diff utility * fix: bug in spec tests from invalid nextActiveIndices * refactor: add/remove comments * refactor: remove this.activeIndicesLength from EpochCache * refactor: simplify shufflingCache.getSync * refactor: remove unnecessary undefined's * refactor: clean up ShufflingCache unit test * feat: add metrics for ShufflingCache * feat: add shufflingCache metrics to state-transition * chore: lint * fix: metric name clash * refactor: add comment about not having ShufflingCache in EpochCache * refactor: rename shuffling decision root functions * refactor: remove unused comment * feat: async add nextShuffling to EpochCache after its built * feat: make ShufflingCache.set private * feat: chance metrics to nextShufflingNotOnEpochCache instead of positive case * refactor: move diff to separate PR * chore: fix tests using shufflingCache.set method * feat: remove minTimeDelayToBuild * feat: return promise from insertPromise and then through build * fix: update metrics names and help field * feat: move build of shuffling to beforeProcessEpoch * feat: allow calc of pivot slot before slot increment * fix: calc of pivot slot before slot increment * Revert "fix: calc of pivot slot before slot increment" This reverts commit 5e65f7e. * Revert "feat: allow calc of pivot slot before slot increment" This reverts commit ed850ee. * feat: allow getting current block root for shuffling calculation * fix: get nextShufflingDecisionRoot directly from state.blockRoots * fix: convert toRootHex * docs: add comment about pulling decisionRoot directly from state * feat: add back metrics for regen attestation cache hit/miss * docs: fix docstring on shufflingCache.build * refactor: change validatorIndices to Uint32Array * refactor: remove comment and change variable name * fix: use toRootHex instead of toHexString * refactor: deduplicate moved function computeAnchorCheckpoint * fix: touch up metrics per PR comments * fix: merge conflict * chore: lint * refactor: add scope around activeIndices to GC arrays * feat: directly use Uint32Array instead of transcribing number array to Uint32Array * refactor: activeIndices per tuyen comment * refactor: rename to epochAfterNext * chore: review PR * feat: update no shuffling ApiError to 500 status * fix: add back unnecessary eslint directive. to be remove under separate PR * feat: update no shuffling ApiError to 500 status * docs: add comment about upcomingEpoch --------- Co-authored-by: Cayman <[email protected]> Co-authored-by: Tuyen Nguyen <[email protected]>
* feat: add ShufflingCache to EpochCache * fix: implementation in state-transition for EpochCache with ShufflingCache * feat: remove shufflingCache.processState * feat: implement ShufflingCache changes in beacon-node * feat: pass shufflingCache when loading cached state from db * test: fix state-transition tests for EpochCache changes * feat: Pass shufflingCache to EpochCache at startup * test: fix slot off by one for decision root in perf test * chore: use ?. syntax * chore: refactoring * feat: add comments and clean up afterProcessEpoch * fix: perf test slot incrementing * fix: remove MockShufflingCache * Revert "chore: refactoring" This reverts commit 104aa56. * refactor: shufflingCache getters * refactor: shufflingCache setters * refactor: build and getOrBuild * docs: add comments to ShufflingCache methods * chore: lint issues * test: update tests in beacon-node * chore: lint * feat: get shufflings from cache for API * feat: minTimeDelayToBuildShuffling cli flag * test: fix shufflingCache promise insertion test * fix: rebase conflicts * fix: changes from debugging sim tests * refactor: minimize changes in afterProcessEpoch * chore: fix lint * chore: fix check-types * chore: fix check-types * feat: add diff utility * fix: bug in spec tests from invalid nextActiveIndices * refactor: add/remove comments * refactor: remove this.activeIndicesLength from EpochCache * refactor: simplify shufflingCache.getSync * refactor: remove unnecessary undefined's * refactor: clean up ShufflingCache unit test * feat: add metrics for ShufflingCache * feat: add shufflingCache metrics to state-transition * chore: lint * fix: metric name clash * refactor: add comment about not having ShufflingCache in EpochCache * refactor: rename shuffling decision root functions * refactor: remove unused comment * feat: async add nextShuffling to EpochCache after its built * feat: make ShufflingCache.set private * feat: chance metrics to nextShufflingNotOnEpochCache instead of positive case * refactor: move diff to separate PR * chore: fix tests using shufflingCache.set method * feat: remove minTimeDelayToBuild * feat: return promise from insertPromise and then through build * fix: update metrics names and help field * feat: move build of shuffling to beforeProcessEpoch * feat: allow calc of pivot slot before slot increment * fix: calc of pivot slot before slot increment * Revert "fix: calc of pivot slot before slot increment" This reverts commit 5e65f7e. * Revert "feat: allow calc of pivot slot before slot increment" This reverts commit ed850ee. * feat: allow getting current block root for shuffling calculation * fix: get nextShufflingDecisionRoot directly from state.blockRoots * fix: convert toRootHex * docs: add comment about pulling decisionRoot directly from state * feat: add back metrics for regen attestation cache hit/miss * docs: fix docstring on shufflingCache.build * refactor: change validatorIndices to Uint32Array * refactor: remove comment and change variable name * fix: use toRootHex instead of toHexString * refactor: deduplicate moved function computeAnchorCheckpoint * fix: touch up metrics per PR comments * fix: merge conflict * chore: lint * refactor: add scope around activeIndices to GC arrays * feat: directly use Uint32Array instead of transcribing number array to Uint32Array * refactor: activeIndices per tuyen comment * refactor: rename to epochAfterNext * chore: review PR * feat: update no shuffling ApiError to 500 status * fix: add back unnecessary eslint directive. to be remove under separate PR * feat: update no shuffling ApiError to 500 status * docs: add comment about upcomingEpoch --------- Co-authored-by: Cayman <[email protected]> Co-authored-by: Tuyen Nguyen <[email protected]>
🎉 This PR is included in v1.23.0 🎉 |
Motivation
Replaces #6521
Move calculation of next shuffling to async to get it off of critical path during epoch transition. There is a full second during epoch transition used to calculate the
epochCtx.nextShuffling
and that can be moved to an async process. Refactored a few pieces of theEpochCache
to make this work and will continue under a separate PR for creating a worker that moves this calculation to a worker thread. Also will investigate using the rust shuffling implementation. Can maybe tune further using a worker thread that NICE to interleave the long calculation into thread idle time which could be ideal.Description
IShufflingCache
to pass shuffling cache toEpochCache
so its available within CachedBeaconStateShufflingCache
inbeacon-node
for further updates mentioned abovenextShuffling
from critical path during epoch transitionsshufflingCache.processState
fromShufflingCache
and update process flow to no longer require itShufflingGetter
and useShufflingCache
directlyTODO's Before Review
[ ] - Investigate removal of
MockShufflingCache
fully (may require movingCachedBeaconState
tests tobeacon-node
[ ] - Verify all sim/e2e and other long running tests
[ ] - Confirm spec compliance