-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: EXC-1838 Run hook after CanisterWasmMemoryLimitExceeded error is fixed #3631
Changes from all commits
097440b
288ddc2
c94139f
008ec28
de4e3be
88fa94e
023b371
93796e0
b677329
5665cf7
3c9dc27
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,7 +51,7 @@ pub fn execute_update( | |
log_dirty_pages: FlagStatus, | ||
deallocation_sender: &DeallocationSender, | ||
) -> ExecuteMessageResult { | ||
let (clean_canister, prepaid_execution_cycles, resuming_aborted) = | ||
let (mut clean_canister, prepaid_execution_cycles, resuming_aborted) = | ||
match prepaid_execution_cycles { | ||
Some(prepaid_execution_cycles) => (clean_canister, prepaid_execution_cycles, true), | ||
None => { | ||
|
@@ -147,13 +147,31 @@ pub fn execute_update( | |
let helper = match UpdateHelper::new(&clean_canister, &original, deallocation_sender) { | ||
Ok(helper) => helper, | ||
Err(err) => { | ||
if err.code() == ErrorCode::CanisterWasmMemoryLimitExceeded | ||
&& original.call_or_task == CanisterCallOrTask::Task(CanisterTask::OnLowWasmMemory) | ||
{ | ||
//`OnLowWasmMemoryHook` is taken from task_queue (i.e. `OnLowWasmMemoryHookStatus` is `Executed`), | ||
// but its was not executed due to error `WasmMemoryLimitExceeded`. To ensure that the hook is executed | ||
// when the error is resolved we need to set `OnLowWasmMemoryHookStatus` to `Ready`. Because of | ||
// the way `OnLowWasmMemoryHookStatus::update` is implemented we first need to remove it from the | ||
// task_queue (which calls `OnLowWasmMemoryHookStatus::update(false)`) followed with `enqueue` | ||
// (which calls `OnLowWasmMemoryHookStatus::update(true)`) to ensure desired behavior. | ||
clean_canister | ||
.system_state | ||
.task_queue | ||
.remove(ic_replicated_state::ExecutionTask::OnLowWasmMemory); | ||
clean_canister | ||
.system_state | ||
.task_queue | ||
.enqueue(ic_replicated_state::ExecutionTask::OnLowWasmMemory); | ||
dragoljub-duric marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
return finish_err( | ||
clean_canister, | ||
original.execution_parameters.instruction_limits.message(), | ||
err, | ||
original, | ||
round, | ||
) | ||
); | ||
} | ||
}; | ||
|
||
|
@@ -341,26 +359,22 @@ impl UpdateHelper { | |
|
||
validate_message(&canister, &original.method)?; | ||
|
||
if let CanisterCallOrTask::Call(_) = original.call_or_task { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before this PR, we wouldn't be enforcing the limit for system tasks here: so I'm not sure why this PR is needed at all; the current effect of this PR seems to be as follows:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This sounds expected to me, and it will behave the same way as in the update case. In my opinion, having a homogenous behavior of tasks/updates is a plus.
I can see the point in this one, maybe you are right. If the developer uses the hook to notify himself that memory is below the threshold, having the hook stopped in this case may be unexpected. |
||
// TODO(RUN-957): Enforce the limit in heartbeat and timer after | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's still one more TODO(RUN-957) in the code to be resolved. CC @dragoljub-duric There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But in my opinion, it seems safer to not enforce the limit during a system task, i.e., simply drop the other TODO(RUN-957), instead of trapping during a system task. |
||
// canister logging ships by removing the `if` above. | ||
mraszyk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
let wasm_memory_usage = canister | ||
.execution_state | ||
.as_ref() | ||
.map_or(NumBytes::new(0), |es| { | ||
num_bytes_try_from(es.wasm_memory.size).unwrap() | ||
}); | ||
|
||
let wasm_memory_usage = canister | ||
.execution_state | ||
.as_ref() | ||
.map_or(NumBytes::new(0), |es| { | ||
num_bytes_try_from(es.wasm_memory.size).unwrap() | ||
}); | ||
if let Some(wasm_memory_limit) = clean_canister.system_state.wasm_memory_limit { | ||
// A Wasm memory limit of 0 means unlimited. | ||
if wasm_memory_limit.get() != 0 && wasm_memory_usage > wasm_memory_limit { | ||
let err = HypervisorError::WasmMemoryLimitExceeded { | ||
bytes: wasm_memory_usage, | ||
limit: wasm_memory_limit, | ||
}; | ||
dragoljub-duric marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if let Some(wasm_memory_limit) = clean_canister.system_state.wasm_memory_limit { | ||
// A Wasm memory limit of 0 means unlimited. | ||
if wasm_memory_limit.get() != 0 && wasm_memory_usage > wasm_memory_limit { | ||
let err = HypervisorError::WasmMemoryLimitExceeded { | ||
bytes: wasm_memory_usage, | ||
limit: wasm_memory_limit, | ||
}; | ||
return Err(err.into_user_error(&canister.canister_id())); | ||
} | ||
return Err(err.into_user_error(&canister.canister_id())); | ||
dragoljub-duric marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dragoljub-duric Wouldn't it be better to not perform any execution in this case (instead of spending cycles on an execution that fails)? Isn't it possible to check the limits in advance before running the execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateHelper::new immediately checks the limit, in this file in line 371 below. So we may move this check in UpdateHelper::new but it will require refactoring of the UpdateHelper::new because, in the case of the error, it should return a modified state (the state where we put the back hook on the task queue). Does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really because it is not clear to me (without looking into the code that I'm not super familiar with) at what point in time the failure happens and if cycles are charged. Could you please clarify that a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/dfinity/ic/pull/3631/files#r1942802433
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context we concluded that: we're charging the base fee of 5M cycles per execution nonetheless, as
update_message_execution_fee
inprepay_execution_cycles
which is not refunded in this case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quick fix I see is that in this case, we can refund an additional 5M. @mraszyk what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick fix, it makes sense, but I wonder if the code doesn't become fragile due to such a fix. Do you see a way to avoid the refund by not preparing the execution (prepaying etc.) at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is double, I am trying to add check-in
execute_call_or_task
before callingprepay_execution_cycles
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check could then ideally also apply to global timer etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will apply to all updates/tasks.