Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

enhancement: WASM error codes, early exit on error or kill switch #1337

Merged
merged 23 commits into from
Sep 15, 2023

Conversation

lostman
Copy link
Contributor

@lostman lostman commented Sep 7, 2023

Description

UPDATE: in addition to adding error codes to host functions as described below, this PR adds another host function:

fn early_exit(err_code: WasmIndexerError) -> !

Using this function we can remove more unwrap or expect calls.

Building on #1293.

This PR follows the example in wasmer repository:

https://github.com/wasmerio/wasmer/blob/master/examples/early_exit.rs#L60

    fn early_exit() -> Result<(), ExitCode> {
        // This is where it happens.
        Err(ExitCode(1))
    }

Note that the return value is Result<(), ExitCode> while the function is imported in WASM as

https://github.com/wasmerio/wasmer/blob/master/examples/early_exit.rs#L41

  (type $early_exit_t (func (param) (result)))

That is, as if it were fn early_exit();.

This PR does the same for our FFI functions. For instance:

fn put_object(
    mut env: FunctionEnvMut<IndexEnv>,
    type_id: i64,
    ptr: u32,
    len: u32,
) -> Result<(), WasmIndexerError> { }

While their types—as declared to the WASM module which will use them—doesn't mention the Result value:

https://github.com/FuelLabs/fuel-indexer/pull/1337/files#diff-447e55cb0fae7d02ac8fd8f7de5a521cbf60d3f2422b5166c39faeb9f995bea6R21-R29

The one difference between how things are handled in early_exit example versus this PR is getting the error code out:

https://github.com/wasmerio/wasmer/blob/master/examples/early_exit.rs#L93

When I tried the same method:

Err(e) => match e.downcast::<ExitCode>()

I got no error out of it. However, the error was printed as RuntimeError(User(EarlyExit)), so the error was handled correctly. (User indicates that a user-defined error triggered a WASM trap).

So, I ended up handling the error this way:

https://github.com/FuelLabs/fuel-indexer/pull/1337/files#diff-fa722c66d2896c684e5a26aa2c56b7ec1734888609fbb64ecbc7dfd6a420c4ffR933

            } else {
                if let Some(e) = e
                    .source()
                    .and_then(|e| e.downcast_ref::<WasmIndexerError>())
                {
                    error!("Indexer({uid}) WASM execution failed: {e}.");
                } else {
                    error!("Indexer({uid}) WASM execution failed: {e:?}.");
                };
                self.db.lock().await.revert_transaction().await?;
                return Err(IndexerError::from(e));
            }

I am unsure why I couldn't downcast the error the same way as was shown in the example.

Testing steps

Manual testing:

  1. Start an indexer.
cargo run -p fuel-indexer -- run --fuel-node-host beta-4.fuel.network --fuel-node-port 80 --replace-indexer --manifest examples/fuel-explorer/fuel-explorer/fuel_explorer.manifest.yaml
  1. Redeploy:
cargo run -p forc-index -- deploy --path examples/fuel-explorer/fuel-explorer --replace-indexer

Expected output:

2023-09-07T09:12:49.663539Z  INFO fuel_indexer::service: 399: Resuming Indexer(fuellabs.explorer) from block 1240
2023-09-07T09:12:54.321165Z  INFO fuel_indexer::database: 206: Database loading schema for Indexer(fuellabs.explorer) with Version(143093c6bbe3a8937f4fb71514cbe6799266c44398866dbbd23e12b977bc1641).
2023-09-07T09:12:54.346555Z  INFO fuel_indexer::executor: 110: Indexer(fuellabs.explorer) subscribing to Fuel node at beta-4.fuel.network:80
2023-09-07T09:12:54.346681Z  WARN fuel_indexer::executor: 117: No end_block specified in manifest. Indexer will run forever.
2023-09-07T09:12:54.346717Z  INFO fuel_indexer::service: 335: Indexer(fuellabs.explorer) was replaced. Stopping previous version of Indexer(fuellabs.explorer).
2023-09-07T09:12:54.347376Z ERROR fuel_indexer::executor: 941: Indexer(fuellabs.explorer) WASM execution failed: Kill switch has been triggered.
2023-09-07T09:12:54.347819Z  INFO fuel_indexer::executor: 199: Kill switch flipped, stopping Indexer(fuellabs.explorer). <('.')>

In particular, this line indicates that an early termination happened due to the kill switch:

2023-09-07T09:12:54.347376Z ERROR fuel_indexer::executor: 941: Indexer(fuellabs.explorer) WASM execution failed: Kill switch has been triggered.

Changelog

  • FFI functions can access the kill switch indicator and exit early when the kill switch has been triggered.
  • FFI functions can exit early on error, including sqlx database operation errors.
  • Add WASM error codes (from @deekerno's PR)
  • Get an error code from the call to the indexer's WASM module.

@lostman lostman self-assigned this Sep 7, 2023
@ra0x3
Copy link
Contributor

ra0x3 commented Sep 8, 2023

@lostman

@lostman
Copy link
Contributor Author

lostman commented Sep 8, 2023

@lostman What's the difference between this and #1293 ?

@deekerno's PR was a starting point. This PR adds a few things. From changelog:

FFI functions can access the kill switch indicator and exit early when the kill switch has been triggered. (added in this PR)
FFI functions can exit early on error, including sqlx database operation errors. (added in this PR)
Add WASM error codes (from @deekerno's PR)
Get an error code from the call to the indexer's WASM module. (added in this PR)

The most significant difference is how the error codes are returned:

e74645e#diff-447e55cb0fae7d02ac8fd8f7de5a521cbf60d3f2422b5166c39faeb9f995bea6L27-L28

In @deekerno's PR, the result is u32 error code. As I explain in the description, in this PR, I follow the early_exit example—the function signature, as declared in the FFI interface mentions no Result type. wasmer terminates the execution immediately if the host function returns an error this way (which is exactly what we want here).

This way, even get_object, which returns u32, can be terminated early:

https://github.com/FuelLabs/fuel-indexer/pull/1337/files#diff-25a226eeb0da9150a8af6c4267e5745445aab8a84cbe942fa13d8a7756246c11R119

It now returns Result<u32, WasmIndexerError>. However, in the FFI interface, it is still -> *mut u8.

@ra0x3
Copy link
Contributor

ra0x3 commented Sep 11, 2023

@lostman

  • Reading over your testing steps, I see the expected output for this PR, however, what is the result as of today on develop? Trying to compare what we have now (prior to this PR) to what we will have after this is merged

Copy link
Contributor

@ra0x3 ra0x3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lostman

  • Following your testing steps I get the following output:
2023-09-11T16:02:51.282146Z  WARN fuel_indexer::executor: 117: No end_block specified in manifest. Indexer will run forever.
2023-09-11T16:02:51.282175Z  INFO fuel_indexer::service: 335: Indexer(fuellabs.explorer) was replaced. Stopping previous version of Indexer(fuellabs.explorer).
2023-09-11T16:02:51.282689Z ERROR fuel_indexer::executor: 937: Indexer(fuellabs.explorer) WASM execution failed: Kill switch has been triggered.
2023-09-11T16:02:51.283092Z  INFO fuel_indexer::executor: 195: Kill switch flipped, stopping Indexer(fuellabs.explorer). <('.')>

UX Questions:

  • Why am I seeing an error! about something failing?
    • My re-deployment worked so ideally I shouldn't see any errors (as this can be confusing)
      • Maybe we can warn! any additional details?
  • The last log message: "kill switch flipped" implies that the previous indexer was stopped (ok), but the message stops there, there's no additional info to suggest that my re-deployment worked

Still reviewing the PR. But the logs are confusing me as to what's actually happening (from a user perspective)

Copy link
Contributor

@deekerno deekerno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up, @lostman! I left a few comments on the code itself.

Also, in addition to your manual testing, we should figure out a way to test the error codes themselves. Part of the reason my PR was still a WIP was that I hadn't figured out a good way to cause the errors (and that I was busy with other things 😅).

packages/fuel-indexer/src/database.rs Outdated Show resolved Hide resolved
packages/fuel-indexer/src/ffi.rs Show resolved Hide resolved
@lostman
Copy link
Contributor Author

lostman commented Sep 11, 2023

@ra0x3, agreed. I will add a special-case for the kill switch. That is, after all, the expected behavior.

Copy link
Contributor

@ra0x3 ra0x3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Did a first pass
  • For changes like this it would give us more confidence if there was an associated test to prove the functionality works (we already don't have many service or executor tests as is)
    • Tests also make the review easier

packages/fuel-indexer-macros/src/decoder.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-macros/src/decoder.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-plugin/src/wasm.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-plugin/src/wasm.rs Show resolved Hide resolved
packages/fuel-indexer-plugin/src/wasm.rs Outdated Show resolved Hide resolved
packages/fuel-indexer/src/executor.rs Outdated Show resolved Hide resolved
@lostman
Copy link
Contributor Author

lostman commented Sep 12, 2023

I added a WasmIndexExecutor test for exit codes. It is a bit crude but covers the basics.

Copy link
Contributor

@ra0x3 ra0x3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more feedback

  • @lostman I also wouldn't rush this PR
  • We do have a deadline, but this work should be able to make it in comfortably before that date, with plenty of time to spare
  • This is an extremely sensitive PR, so let's not rush it

packages/fuel-indexer-database/src/lib.rs Show resolved Hide resolved
packages/fuel-indexer-api-server/src/uses.rs Show resolved Hide resolved
packages/fuel-indexer-lib/src/lib.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-lib/src/lib.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-lib/src/lib.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-lib/src/lib.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-lib/src/lib.rs Outdated Show resolved Hide resolved
@@ -833,7 +833,7 @@ impl From<ObjectDecoder> for TokenStream {
.map(|query| query.to_string())
.collect::<Vec<_>>();

d.lock().await.put_many_to_many_record(queries).await;
d.lock().await.put_many_to_many_record(queries).await.expect(&format!("Entity::save_many_to_many for {} failed.", stringify!(#ident)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think here we should be logging the error messages (and .expect) as close to the actual culprit call as possible
  • In this case the culprit of this error would be fuel_indexer::database::Database:: put_many_to_many_record
    • If the error is logged/handled there, we don't need to handle it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is handled in put_many_to_many_record. In WASM, it is a host function that returns Result<>, which triggers early termination.

The code here is for native execution. This expect achieves an equivalent behavior.

@@ -868,7 +871,7 @@ impl From<ObjectDecoder> for TokenStream {
Self::TYPE_ID,
self.to_row(),
serialize(&self.to_row())
).await;
).await.expect(&format!("Entity::save for {} failed.", stringify!(#ident)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here:

  • I think here we should be logging the error messages (and .expect) as close to the actual culprit call as possible
  • In this case the culprit of this error would be fuel_indexer::database::Database:: put_object
    • If the error is logged/handled there, we don't need to handle it here

Copy link
Contributor Author

@lostman lostman Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is handled in put_object. In WASM, it is a host function that returns Result<>, which triggers early termination.

The code here is for native execution. This expect achieves an equivalent behavior.

packages/fuel-indexer-tests/tests/service.rs Outdated Show resolved Hide resolved
@lostman
Copy link
Contributor Author

lostman commented Sep 12, 2023

Left some more feedback

  • @lostman I also wouldn't rush this PR
  • We do have a deadline, but this work should be able to make it in comfortably before that date, with plenty of time to spare
  • This is an extremely sensitive PR, so let's not rush it

I'd rather merge this now (very soon) and have it battle-tested for a couple of weeks than merge it just before the deadline and have no time to resolve any issues that may arise.

I need this to finish #1150 and again, I'd rather test both for a couple of weeks before the deadline is upon us.

@lostman lostman requested a review from ra0x3 September 12, 2023 16:14
@ra0x3 ra0x3 linked an issue Sep 12, 2023 that may be closed by this pull request
@lostman lostman force-pushed the maciej/wasm-error-codes branch 2 times, most recently from b6c0a78 to adee0f8 Compare September 13, 2023 11:09
@lostman
Copy link
Contributor Author

lostman commented Sep 13, 2023

@ra0x3, @deekerno,

I added an early_exit function and tried using it in handler_block_wasm:

adee0f8#diff-7121bb4841992a7a257dbb97c0e60f371c59f2845b4413d5413b7adf779feb5aR22

and some trybuild tests failed to link, complaining about missing ff_early_exit at the linking stage:

https://github.com/FuelLabs/fuel-indexer/actions/runs/6171631718/job/16750296869#step:13:106

All integration tests succeeded, and I could compile and deploy indexers.

Any idea what could be going on?

It would've been a nice use-case for this function.

@ra0x3
Copy link
Contributor

ra0x3 commented Sep 13, 2023

Any idea what could be going on?

@lostman

  • In the trybuild tests, we manually add the missing FF symbols
  • Example
  • Let me know if this helps

@lostman
Copy link
Contributor Author

lostman commented Sep 13, 2023

@ra0x3, yes, it helps. Thanks!

Just pushed this:
46507e8

And trybuild tests pass 😄 (locally; waiting on CI)

@ra0x3
Copy link
Contributor

ra0x3 commented Sep 15, 2023

@lostman Is this mean to be re-reviewed? I do remember leaving a previous review, but I see I'm ping'd for review again. But I also see a lot of the feedback was not implemented (and as well no reason was provided as to why).

@lostman
Copy link
Contributor Author

lostman commented Sep 15, 2023

@ra0x3, yes, it is meant to be re-reviewed. I can't re-trigger the review request since it is already re-requested.

I implemented your feedback. What is missing?

@ra0x3
Copy link
Contributor

ra0x3 commented Sep 15, 2023

@lostman The review comments are unresolved, and they aren't labeled as "outdated". For any review comments where you implement the feedback, be sure to resolve the conversation (so it doesn't clutter the review, and look as if it's unresolved). And for anything you didn't implement, just leave a responding comment as to why you chose to to implement the feedback.

@lostman
Copy link
Contributor Author

lostman commented Sep 15, 2023

Apologies. GitHub UI tricked me a little:

Screenshot 2023-09-15 at 4 31 14 PM

Completely missed these comments. 😂

ra0x3
ra0x3 previously approved these changes Sep 15, 2023
@ra0x3
Copy link
Contributor

ra0x3 commented Sep 15, 2023

@lostman CI failing

@ra0x3 ra0x3 merged commit 1b345b0 into develop Sep 15, 2023
18 checks passed
@ra0x3 ra0x3 deleted the maciej/wasm-error-codes branch September 15, 2023 18:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add error codes to WASM executors
3 participants