-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: EXC-1838 Run hook after CanisterWasmMemoryLimitExceeded error is fixed #3631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
rs/execution_environment/src/execution_environment/tests/canister_task.rs
Outdated
Show resolved
Hide resolved
if err.code() == ErrorCode::CanisterWasmMemoryLimitExceeded | ||
&& original.call_or_task == CanisterCallOrTask::Task(CanisterTask::OnLowWasmMemory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dragoljub-duric Wouldn't it be better to not perform any execution in this case (instead of spending cycles on an execution that fails)? Isn't it possible to check the limits in advance before running the execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateHelper::new immediately checks the limit, in this file in line 371 below. So we may move this check in UpdateHelper::new but it will require refactoring of the UpdateHelper::new because, in the case of the error, it should return a modified state (the state where we put the back hook on the task queue). Does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that answer your question?
Not really because it is not clear to me (without looking into the code that I'm not super familiar with) at what point in time the failure happens and if cycles are charged. Could you please clarify that a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context we concluded that: we're charging the base fee of 5M cycles per execution nonetheless, as update_message_execution_fee
in prepay_execution_cycles
which is not refunded in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quick fix I see is that in this case, we can refund an additional 5M. @mraszyk what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quick fix I see is that in this case
As a quick fix, it makes sense, but I wonder if the code doesn't become fragile due to such a fix. Do you see a way to avoid the refund by not preparing the execution (prepaying etc.) at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is double, I am trying to add check-in execute_call_or_task
before calling prepay_execution_cycles
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check could then ideally also apply to global timer etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will apply to all updates/tasks.
@@ -341,26 +359,22 @@ impl UpdateHelper { | |||
|
|||
validate_message(&canister, &original.method)?; | |||
|
|||
if let CanisterCallOrTask::Call(_) = original.call_or_task { | |||
// TODO(RUN-957): Enforce the limit in heartbeat and timer after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still one more TODO(RUN-957) in the code to be resolved. CC @dragoljub-duric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in my opinion, it seems safer to not enforce the limit during a system task, i.e., simply drop the other TODO(RUN-957), instead of trapping during a system task.
@@ -341,26 +359,22 @@ impl UpdateHelper { | |||
|
|||
validate_message(&canister, &original.method)?; | |||
|
|||
if let CanisterCallOrTask::Call(_) = original.call_or_task { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, we wouldn't be enforcing the limit for system tasks here: so I'm not sure why this PR is needed at all; the current effect of this PR seems to be as follows:
- global timers and heartbeats fail if the wasm memory limit is exceeded initially (although they succeed if the wasm memory limit is exceeded during their execution): this behavior seems surprising to me
- low on wasm memory hooks are retried if the wasm memory limit is exceeded initially (although they succeed if the wasm memory limit is exceeded during their execution): this behavior might be undesirable since the hook might be crucial in resolving the exceeded wasm memory limit and it wouldn't run due to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global timers and heartbeats fail if the wasm memory limit is exceeded initially (although they succeed if the wasm memory limit is exceeded during their execution): this behavior seems surprising to me
This sounds expected to me, and it will behave the same way as in the update case. In my opinion, having a homogenous behavior of tasks/updates is a plus.
low on wasm memory hooks are retried if the wasm memory limit is exceeded initially (although they succeed if the wasm memory limit is exceeded during their execution): this behavior might be undesirable since the hook might be crucial in resolving the exceeded wasm memory limit and it wouldn't run due to this PR.
I can see the point in this one, maybe you are right. If the developer uses the hook to notify himself that memory is below the threshold, having the hook stopped in this case may be unexpected.
Problem:
As previously observed by @berestovskyy #3455 (comment) it may happen that execution of low_wasm_memory hook is stopped when
wasm_memory_limit
<used_wasm_memory
.Solution:
If that happens, run the hook after the error is fixed if the hook condition remains satisfied.