Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jwk #1: validator txn for publishing updates #11853

Merged
merged 21 commits into from
Feb 4, 2024
Merged

Conversation

zjma
Copy link
Contributor

@zjma zjma commented Feb 1, 2024

Description

Test Plan

Copy link

trunk-io bot commented Feb 1, 2024

⏱️ 52h 46m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-coverage 8h 55m 🟩🟩
windows-build 8h 49m 🟩🟩🟩🟩🟩 (+33 more)
rust-smoke-coverage 6h 12m 🟩🟩
rust-unit-tests 4h 37m 🟥🟥🟩 (+31 more)
rust-move-unit-coverage 4h 4m 🟥🟩🟩 (+30 more)
rust-move-tests 2h 51m 🟥🟩🟩 (+30 more)
run-tests-main-branch 2h 36m 🟩🟩🟩🟩🟩 (+32 more)
rust-lints 2h 9m 🟥🟩🟩 (+31 more)
execution-performance / single-node-performance 2h 8m 🟩🟩🟩🟩🟩 (+3 more)
rust-smoke-tests 2h 2m 🟥🟩🟩 (+2 more)
check 1h 49m 🟩🟩🟩🟩 (+33 more)
check-dynamic-deps 1h 22m 🟩🟩🟩🟩🟩 (+33 more)
rust-images / rust-all 1h 13m 🟥🟩🟩 (+2 more)
general-lints 1h 12m 🟩🟩🟩🟩🟩 (+31 more)
forge-e2e-test / forge 49m 🟩🟩🟩
forge-compat-test / forge 42m 🟩🟩🟩
cli-e2e-tests / run-cli-tests 28m 🟩🟩🟩
semgrep/ci 14m 🟩🟩🟩🟩🟩 (+33 more)
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 6m 🟥🟥🟩🟥
file_change_determinator 6m 🟩🟩🟩🟩🟩 (+33 more)
file_change_determinator 6m 🟩🟩🟩🟩 (+33 more)
node-api-compatibility-tests / node-api-compatibility-tests 4m 🟩🟩🟩🟩
permission-check 2m 🟩🟩🟩🟩🟩 (+33 more)
permission-check 2m 🟩🟩🟩🟩🟩 (+33 more)
permission-check 2m 🟩🟩🟩🟩🟩 (+33 more)
permission-check 2m 🟩🟩🟩🟩🟩 (+33 more)
execution-performance / file_change_determinator 1m 🟩🟩🟩🟩🟩 (+3 more)
file_change_determinator 1m 🟩🟩🟩🟩 (+3 more)
upload-to-codecov 24s 🟩🟩
permission-check 20s 🟩🟩🟩🟩🟩 (+3 more)
determine-docker-build-metadata 17s 🟩🟩🟩🟩🟩 (+3 more)

🚨 4 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
check-dynamic-deps 5m 2m +126%
rust-lints 12m 8m +50%
rust-images / rust-all 17m 13m +26%
windows-build 23m 19m +25%

settingsfeedbackdocs ⋅ learn more about trunk.io

@zjma zjma force-pushed the zjma/jwk_txn_execution branch from 029e686 to d0957db Compare February 1, 2024 10:04
@zjma zjma changed the base branch from main to zjma/jwk_types_update February 1, 2024 10:07
Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 245 lines in your changes are missing coverage. Please review.

Comparison is base (be3394f) 71.2% compared to head (58b9493) 71.2%.
Report is 5 commits behind head on main.

Files Patch % Lines
aptos-move/aptos-vm/src/validator_txns/jwk.rs 0.0% 83 Missing ⚠️
types/src/jwks/mod.rs 0.0% 44 Missing ⚠️
types/src/jwks/jwk/mod.rs 0.0% 28 Missing ⚠️
types/src/jwks/unsupported/mod.rs 0.0% 23 Missing ⚠️
types/src/jwks/rsa/mod.rs 0.0% 21 Missing ⚠️
aptos-move/aptos-vm/src/aptos_vm.rs 16.6% 10 Missing ⚠️
types/src/dkg/mod.rs 0.0% 9 Missing ⚠️
types/src/validator_txn.rs 0.0% 7 Missing ⚠️
aptos-move/aptos-vm/src/system_module_names.rs 0.0% 6 Missing ⚠️
types/src/move_any.rs 0.0% 6 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff            @@
##             main   #11853     +/-   ##
=========================================
- Coverage    71.2%    71.2%   -0.1%     
=========================================
  Files         795      795             
  Lines      182802   182968    +166     
=========================================
+ Hits       130264   130282     +18     
- Misses      52538    52686    +148     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zjma zjma changed the title jwk update vtxn jwk txn def and execution implementation Feb 1, 2024
@zjma zjma changed the title jwk txn def and execution implementation jwk #1: jwk txn def and execution implementation Feb 1, 2024
@zjma zjma added the JWK label Feb 1, 2024
@zjma zjma changed the title jwk #1: jwk txn def and execution implementation jwk #1: validator txn for publishing updates Feb 1, 2024

enum ExpectedFailure {
// Move equivalent: `errors::invalid_argument(*)`
IncorrectVersion = 0x010103,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do these 0x... values mean here? does it map to some values at other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see aptos-move/framework/aptos-framework/sources/jwks.move

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't find it. Can you share a link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ENATIVE_MISSING_RESOURCE_VALIDATOR_SET: u64 = 0x0101;
const ENATIVE_MISSING_RESOURCE_OBSERVED_JWKS: u64 = 0x0102;
const ENATIVE_INCORRECT_VERSION: u64 = 0x0103;
const ENATIVE_MULTISIG_VERIFICATION_FAILED: u64 = 0x0104;
const ENATIVE_NOT_ENOUGH_VOTING_POWER: u64 = 0x0105;

types/src/validator_txn.rs Show resolved Hide resolved
match self {
ValidatorTransaction::DummyTopic1(_) => Topic::DUMMY1,
ValidatorTransaction::DKGResult(_) => Topic::DKG,
ValidatorTransaction::DummyTopic2(_) => Topic::DUMMY2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dummy is used outside out test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, dummy has gone

match self.process_jwk_update_inner(resolver, log_context, session_id, update) {
Ok((vm_status, vm_output)) => Ok((vm_status, vm_output)),
Err(Expected(failure)) => {
// Pretend we are inside Move, and expected failures are like Move aborts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to simulate move abort? normal move abort is not discard too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to. But I had the impression that In the future we may want to move the checks into move.

Discarding or not is a separate question? And for vtxn we want to discard if the checks failed?

Base automatically changed from zjma/jwk_types_update to main February 2, 2024 20:44
Copy link
Contributor

@danielxiangzl danielxiangzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can approve once dummy vtxn is marked test-only.

@zjma zjma marked this pull request as ready for review February 3, 2024 02:34
@zjma zjma requested a review from sasha8 as a code owner February 3, 2024 02:34
@zjma
Copy link
Contributor Author

zjma commented Feb 3, 2024

Can approve once dummy vtxn is marked test-only.

fixed, dummy has gone

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

// Pretend we are inside Move, and expected failures are like Move aborts.
Ok((
VMStatus::MoveAbort(AbortLocation::Script, failure as u64),
VMOutput::empty_with_status(TransactionStatus::Discard(StatusCode::ABORTED)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move aborts are kept, so we want a Keep here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what we really want is to discard (otherwise we allow malicious validators to put things in)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not sure what's the best value for the 1st element of the returning tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other vmstatus variants seem to be more serious and can cause the whole block to be rejected. That's undesired behavior

Copy link
Contributor

@georgemitenkov georgemitenkov Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we do not need VMOutput, and can return a VMStatus? For example, for prologue which runs Move code we do it like this: https://github.com/aptos-labs/aptos-core/blob/main/aptos-move/aptos-vm/src/errors.rs#L64 which rempas the error to status and returns just that. The caller can see Err(status) and create an empty output with a Discard?

Copy link
Contributor Author

@zjma zjma Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, if the transaction is valid, we want to make changes; otherwise (expected failures), we want to discard.

Are you are suggesting change: from

    fn process_jwk_update(...) -> Result<(VMStatus, VMOutput), VMStatus> {
        match self.process_jwk_update_inner(...) {
            //...
            Err(Expected(failure)) => {
                // Pretend we are inside Move, and expected failures are like Move aborts.
                Ok((
                    VMStatus::MoveAbort(AbortLocation::Script, failure as u64),
                    VMOutput::empty_with_status(TransactionStatus::Discard(StatusCode::ABORTED)),
                ))
            },
            //...
        }
    }

to

    fn process_jwk_update(...) -> Result<(VMStatus, VMOutput), VMStatus> {
        match self.process_jwk_update_inner(...) {
            //...
            Err(Expected(failure)) => {
                // Pretend we are inside Move, and expected failures are like Move aborts.
                Err(VMStatus::SOMETHING)
            },
            //...
        }
    }

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't return errors here, that will kill the block execution. we can keep it as is but it's a bit ugly

@zjma zjma enabled auto-merge (squash) February 4, 2024 08:09

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Feb 4, 2024

✅ Forge suite compat success on aptos-node-v1.8.3 ==> 58b9493b2076199c8a64ae4163fb361cb6f83555

Compatibility test results for aptos-node-v1.8.3 ==> 58b9493b2076199c8a64ae4163fb361cb6f83555 (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 5086 txn/s, latency: 6377 ms, (p50: 5300 ms, p90: 9600 ms, p99: 21400 ms), latency samples: 198360
2. Upgrading first Validator to new version: 58b9493b2076199c8a64ae4163fb361cb6f83555
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1779 txn/s, latency: 15709 ms, (p50: 18900 ms, p90: 21900 ms, p99: 22500 ms), latency samples: 92520
3. Upgrading rest of first batch to new version: 58b9493b2076199c8a64ae4163fb361cb6f83555
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1856 txn/s, latency: 15787 ms, (p50: 18500 ms, p90: 22300 ms, p99: 22600 ms), latency samples: 92800
4. upgrading second batch to new version: 58b9493b2076199c8a64ae4163fb361cb6f83555
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3629 txn/s, latency: 8835 ms, (p50: 9900 ms, p90: 12500 ms, p99: 13700 ms), latency samples: 141540
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> 58b9493b2076199c8a64ae4163fb361cb6f83555 passed
Test Ok

Copy link
Contributor

github-actions bot commented Feb 4, 2024

✅ Forge suite realistic_env_max_load success on 58b9493b2076199c8a64ae4163fb361cb6f83555

two traffics test: inner traffic : committed: 6660 txn/s, latency: 5723 ms, (p50: 5400 ms, p90: 7600 ms, p99: 12800 ms), latency samples: 2890440
two traffics test : committed: 100 txn/s, latency: 2243 ms, (p50: 2200 ms, p90: 2600 ms, p99: 4400 ms), latency samples: 1880
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.225, avg: 0.199", "QsPosToProposal: max: 0.170, avg: 0.153", "ConsensusProposalToOrdered: max: 0.599, avg: 0.571", "ConsensusOrderedToCommit: max: 0.515, avg: 0.478", "ConsensusProposalToCommit: max: 1.091, avg: 1.048"]
Max round gap was 1 [limit 4] at version 1105435. Max no progress secs was 6.533544 [limit 15] at version 1105435.
Test Ok

@zjma zjma merged commit 793563f into main Feb 4, 2024
46 of 47 checks passed
@zjma zjma deleted the zjma/jwk_txn_execution branch February 4, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR JWK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants