Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(upgrader): introduce disaster recovery request operation #461

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mraszyk
Copy link
Collaborator

@mraszyk mraszyk commented Dec 13, 2024

This PR introduces a disaster recovery request operation type in the upgrader canister (to later support more operation types beyond the existing install code operation type).

Because this is a breaking change for the upgrader's stable memory layout, the post-upgrade hook prevents an upgrade if there are pending disaster recovery requests (that would fail to deserialize after refactoring their types in this PR):

Request execution failed due to `failed to upgrade upgrader: UPGRADE_FAILED: System upgrade failed. (Some({"reason": "The canister change failed due to failed to install code: Error from Canister dxw6e-nqaaa-aaaal-amrcq-cai: Canister called `ic0.trap` with message: Panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message(\"missing field `operation`\"), offset: 0 }', core/upgrader/impl/src/model/disaster_recovery.rs:435:1\nCanister Backtrace:\nunknown function at index 2108\nunknown function at index 460\nunknown function at index 773\nunknown function at index 622\nunknown function at index 1040\nunknown function at index 340\nunknown function at index 1351\nunknown function at index 341\nunknown function at index 1276\nunknown function at index 815\n.\nConsider gracefully handling failures from this canister or altering the canister to handle exceptions. See documentation: http://internetcomputer.org/docs/current/references/execution-errors#trapped-explicitly"}))`.

@mraszyk mraszyk marked this pull request as ready for review December 18, 2024 10:42
@mraszyk mraszyk requested a review from a team as a code owner December 18, 2024 10:42
.recovery_requests
.is_empty()
{
ic_cdk::trap("upgrader cannot be upgraded due to pending disaster recovery requests")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you recover from this state? The only place where recovery_requests is mutated is in evaluate_requests, but that only does it if the recovery rule is met. It does call retain on the requests to clean up the expired ones, but does not actually save it unless the requests meet evaluation.

Also at this point the incompatible stable memory is deserialized, no? So in case the vec is not empty, this has already trapped, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you recover from this state?

The expectation is that the upgrader canister is not itself upgraded if there are pending disaster recoveries.

Also at this point the incompatible stable memory is deserialized, no? So in case the vec is not empty, this has already trapped, no?

That's true, the error message indeed refers to failed deserialization.

Copy link
Contributor

@olaszakos olaszakos Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is that the upgrader canister is not itself upgraded if there are pending disaster recoveries.

But it can happen that a request is there for some reason, and then the upgrader can no longer be upgraded until the next recovery request passes.

That's true, the error message indeed refers to failed deserialization.

What I mean is that maybe the is_empty() call already traps and the subsequent trap is ineffective as execution never gets there. I don't know if is_empty already deserializes something incompatible, that depends on the internals of the stable memory library. Maybe adopt the custom serde deserializer-based migration from the station? And then the requests can safely be migrated.

@mraszyk mraszyk marked this pull request as draft January 6, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants