-
Notifications
You must be signed in to change notification settings - Fork 67
enhancement: WASM error codes, early exit on error or kill switch #1337
Conversation
cac8eaa
to
e74645e
Compare
|
@deekerno's PR was a starting point. This PR adds a few things. From changelog: FFI functions can access the kill switch indicator and exit early when the kill switch has been triggered. (added in this PR) The most significant difference is how the error codes are returned: e74645e#diff-447e55cb0fae7d02ac8fd8f7de5a521cbf60d3f2422b5166c39faeb9f995bea6L27-L28 In @deekerno's PR, the result is This way, even It now returns |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Following your testing steps I get the following output:
2023-09-11T16:02:51.282146Z WARN fuel_indexer::executor: 117: No end_block specified in manifest. Indexer will run forever.
2023-09-11T16:02:51.282175Z INFO fuel_indexer::service: 335: Indexer(fuellabs.explorer) was replaced. Stopping previous version of Indexer(fuellabs.explorer).
2023-09-11T16:02:51.282689Z ERROR fuel_indexer::executor: 937: Indexer(fuellabs.explorer) WASM execution failed: Kill switch has been triggered.
2023-09-11T16:02:51.283092Z INFO fuel_indexer::executor: 195: Kill switch flipped, stopping Indexer(fuellabs.explorer). <('.')>
UX Questions:
- Why am I seeing an
error!
about something failing?- My re-deployment worked so ideally I shouldn't see any errors (as this can be confusing)
- Maybe we can
warn!
any additional details?
- Maybe we can
- My re-deployment worked so ideally I shouldn't see any errors (as this can be confusing)
- The last log message: "kill switch flipped" implies that the previous indexer was stopped (ok), but the message stops there, there's no additional info to suggest that my re-deployment worked
Still reviewing the PR. But the logs are confusing me as to what's actually happening (from a user perspective)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up, @lostman! I left a few comments on the code itself.
Also, in addition to your manual testing, we should figure out a way to test the error codes themselves. Part of the reason my PR was still a WIP was that I hadn't figured out a good way to cause the errors (and that I was busy with other things 😅).
@ra0x3, agreed. I will add a special-case for the kill switch. That is, after all, the expected behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Did a first pass
- For changes like this it would give us more confidence if there was an associated test to prove the functionality works (we already don't have many
service
orexecutor
tests as is)- Tests also make the review easier
88d388b
to
92e0139
Compare
I added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more feedback
- @lostman I also wouldn't rush this PR
- We do have a deadline, but this work should be able to make it in comfortably before that date, with plenty of time to spare
- This is an extremely sensitive PR, so let's not rush it
@@ -833,7 +833,7 @@ impl From<ObjectDecoder> for TokenStream { | |||
.map(|query| query.to_string()) | |||
.collect::<Vec<_>>(); | |||
|
|||
d.lock().await.put_many_to_many_record(queries).await; | |||
d.lock().await.put_many_to_many_record(queries).await.expect(&format!("Entity::save_many_to_many for {} failed.", stringify!(#ident))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think here we should be logging the error messages (and
.expect
) as close to the actual culprit call as possible - In this case the culprit of this error would be
fuel_indexer::database::Database:: put_many_to_many_record
- If the error is logged/handled there, we don't need to handle it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is handled in put_many_to_many_record
. In WASM, it is a host function that returns Result<>
, which triggers early termination.
The code here is for native execution. This expect
achieves an equivalent behavior.
@@ -868,7 +871,7 @@ impl From<ObjectDecoder> for TokenStream { | |||
Self::TYPE_ID, | |||
self.to_row(), | |||
serialize(&self.to_row()) | |||
).await; | |||
).await.expect(&format!("Entity::save for {} failed.", stringify!(#ident))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here:
- I think here we should be logging the error messages (and
.expect
) as close to the actual culprit call as possible - In this case the culprit of this error would be
fuel_indexer::database::Database:: put_object
- If the error is logged/handled there, we don't need to handle it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is handled in put_object
. In WASM, it is a host function that returns Result<>, which triggers early termination.
The code here is for native execution. This expect achieves an equivalent behavior.
I'd rather merge this now (very soon) and have it battle-tested for a couple of weeks than merge it just before the deadline and have no time to resolve any issues that may arise. I need this to finish #1150 and again, I'd rather test both for a couple of weeks before the deadline is upon us. |
b6c0a78
to
adee0f8
Compare
adee0f8
to
a4549a8
Compare
I added an adee0f8#diff-7121bb4841992a7a257dbb97c0e60f371c59f2845b4413d5413b7adf779feb5aR22 and some https://github.com/FuelLabs/fuel-indexer/actions/runs/6171631718/job/16750296869#step:13:106 All integration tests succeeded, and I could compile and deploy indexers. Any idea what could be going on? It would've been a nice use-case for this function. |
184fbf7
to
c295242
Compare
@lostman Is this mean to be re-reviewed? I do remember leaving a previous review, but I see I'm ping'd for review again. But I also see a lot of the feedback was not implemented (and as well no reason was provided as to why). |
@ra0x3, yes, it is meant to be re-reviewed. I can't re-trigger the review request since it is already re-requested. I implemented your feedback. What is missing? |
@lostman The review comments are unresolved, and they aren't labeled as "outdated". For any review comments where you implement the feedback, be sure to resolve the conversation (so it doesn't clutter the review, and look as if it's unresolved). And for anything you didn't implement, just leave a responding comment as to why you chose to to implement the feedback. |
Co-authored-by: rashad <[email protected]>
Co-authored-by: rashad <[email protected]>
@lostman CI failing |
Description
UPDATE: in addition to adding error codes to host functions as described below, this PR adds another host function:
Using this function we can remove more
unwrap
orexpect
calls.Building on #1293.
This PR follows the example in
wasmer
repository:https://github.com/wasmerio/wasmer/blob/master/examples/early_exit.rs#L60
Note that the return value is
Result<(), ExitCode>
while the function is imported in WASM ashttps://github.com/wasmerio/wasmer/blob/master/examples/early_exit.rs#L41
That is, as if it were
fn early_exit();
.This PR does the same for our FFI functions. For instance:
While their types—as declared to the WASM module which will use them—doesn't mention the
Result
value:https://github.com/FuelLabs/fuel-indexer/pull/1337/files#diff-447e55cb0fae7d02ac8fd8f7de5a521cbf60d3f2422b5166c39faeb9f995bea6R21-R29
The one difference between how things are handled in
early_exit
example versus this PR is getting the error code out:https://github.com/wasmerio/wasmer/blob/master/examples/early_exit.rs#L93
When I tried the same method:
I got no error out of it. However, the error was printed as
RuntimeError(User(EarlyExit))
, so the error was handled correctly. (User
indicates that a user-defined error triggered a WASM trap).So, I ended up handling the error this way:
https://github.com/FuelLabs/fuel-indexer/pull/1337/files#diff-fa722c66d2896c684e5a26aa2c56b7ec1734888609fbb64ecbc7dfd6a420c4ffR933
I am unsure why I couldn't
downcast
the error the same way as was shown in the example.Testing steps
Manual testing:
Expected output:
In particular, this line indicates that an early termination happened due to the kill switch:
Changelog
sqlx
database operation errors.