Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iree_status_t/Status reworking #265

Closed
7 tasks done
benvanik opened this issue Dec 26, 2019 · 5 comments · Fixed by #2849
Closed
7 tasks done

iree_status_t/Status reworking #265

benvanik opened this issue Dec 26, 2019 · 5 comments · Fixed by #2849
Assignees
Labels
enhancement ➕ New feature or request runtime Relating to the IREE runtime library

Comments

@benvanik
Copy link
Collaborator

benvanik commented Dec 26, 2019

I've been thinking about how to get good Status-like features (particularly chained messages and stack traces) through the C API. Our big limitation is that we have no RAII for the return values and as such storing anything inside of them (ala Status payloads) mandates that callers do something to them.

I've looked at a few other ML/mathish frameworks to see what they do (as that's what people coming into IREE may be expecting):

  • TF uses TF_Status output args that require manual deletion
    • They have no async methods so no need to handle that
    • Data is just a code and a (internally) a Status with whatever that has
  • tflite 🤷‍♀
  • SNPE errno thread-local errors
  • Torch uses THErrorHandlerFunction and thread-local error codes
  • FANN uses stack-local error reporting (painful and leak-prone)
  • CoreML has exceptions
  • MetalPerformanceShaders is mostly error-handling free (akin to Metal), largely (and particularly for kernels) exception free and almost no other handling otherwise
  • DirectML relies on HRESULTs in most calls but is like MetalPerformanceShaders in that submission is left to the underlying API (D3D12 in this case)

Since our HAL is Vulkan(/D3D12/Metal)-like, following DirectML/MPS seems fine. Vulkan uses a combo of per-call error codes on a small set of functions (mainly those that allocate or submit/wait) and otherwise leaves everything to VK_EXT_debug_report per-instance handlers.

Some design points that fall out of this:

  • invalid arguments/failed preconditions are returned to callers on a per-call basis
  • async error granularity is per-submission with nothing finer (per-batch, per-command buffer, etc)
  • allocations for real memory (not transient memory) can synchronously fail and returned to callers
  • allocations for future memory (transient memory) may fail asynchronously without notification (leading to submission failure)
  • any individual command recorded to a command buffer may fail asynchronously (leading to submission failure)
  • any sync per-call failure of a submission request leaves the state in a valid state
  • any async failure of submission execution leaves any state that may have been modified by the submission invalid (including internal state)
  • device loss must be handled (meaning that every buffer/in-flight submission/etc for a particular device is made invalid)
  • recovery of submission failures is not possible; applications must decide what to do (snapshot at some frequency if they want to be able to restore, etc)

This gives us a few major cases to cover:

  • synchronous per-call errors
  • asynchronous per-submission (=invocation) errors

I think we can cover both of these without needing a diagnostics reporter like callback system. As users can't legally use invocation results without asking for completion we have a good point to return any async errors that accumulated. This avoids the need for thread-local/errno style errors and to more closely correlate errors across binding boundaries (as we always have them on the stack).

If this was pure C++ we'd just use Status and be done with it. Unfortunately we can't easily do that without RAII, and Status itself doesn't really have some of the things we'd want to do; for example, it'd be nice to include both host and VM stack traces in statuses.

So given that a per-call Status-like thing can handle all our cases the real question comes down to what kind of information do we want to attach and how do we attach it without making it painful to use.

The approaches I'm debating between are:

  1. mandatory caller deletion of statuses
  2. optional caller-provided per-call error callbacks

I'm leaning towards caller-deletion because the common cases look a lot more like traditional Status behavior and avoid a lot of boilerplate. For example, propagating status back to callers:

iree_status_t my_call(...);
iree_status_t my_caller() {
  // Just returning the status; no need for deletion here.
  IREE_RETURN_IF_ERROR(my_call(...) , "optional added message %d", 5);
  return iree_status_ok();
}
int main() {
  // CHECK_OK deletes the status; this will be the common case around errors that the caller doesn't really care to handle.
  IREE_CHECK_OK(my_caller(), "optional added message");
  // Otherwise, get the status back and do stuff to it, taking care to delete it:
  iree_status_t status = my_caller();
  bool is_aborted = iree_status_is_aborted(status);
  iree_status_free(status);
  return is_aborted ? 1 : 0;
}

If in C++ we can use if-scoping to make the cleanup clearer:

if (auto status = my_caller()) {
  iree_status_free(status);
}
// or, wrap in RAII:
if (auto status = ApiStatus(my_caller()) {
  status.special_stack_trace_magic();
}
// or, wrap in Status:
RETURN_IF_ERROR(FromApiStatus(my_caller()));

We can use the must-use-result tag to make errors of omission easier to catch (wrap in an IREE_CHECK_OK and continue, or properly handle). ASAN should pretty clearly show leaks in other cases.

To keep the generated code smaller my thought is to have iree_status_t be a uintptr_t with OK being == 0. This should make the common error-free case no more expensive than if a bool/int error was returned and all checks can be non-zero/zero checks instead of specific equality. There's also no need to dereference the status to perform flow control decisions.

As an optimization my thought is to use the lower 4 or 5 bits of the uintptr_t to hold the Status-compatible error code. This only requires that we alloc real status payloads as 32-byte aligned (not a big deal given the frequency) but for payload-free statuses we can entirely avoid allocation and dereferencing; the status is just as heavy as an int. So iree_status_is_aborted() and other helper functions are just masking off the code bits and comparing with the value in a register, meaning that if we are returning things like deadline-exceeded (which can happen in tight fence completion query loops) or aborted flags we have zero performance hit. Only when we want to add payloads (stack traces, messages, etc) would we really allocate memory for the status.

It also makes it possible to - in super tiny builds - strip all custom payloads and map iree_status_t to uint8_t (or whatever) without any change in code required.

For moving between languages/layers we can have the payload alias other internal payloads. For example, we can have a Status* stored in the payload to allow full-fidelity status propagation from C++ through the C API and up to callers.

Implementation plan is:

  • add macros for IREE_CHECK_OK/etc and helpers for iree_status_is_*
  • replace all error code comparisons with the above
  • add iree_status_free
  • make iree_status_t a must-use value
  • ensure all sites are either handling the status and using iree_status_free or the above macros
  • switch iree_status_t to an opaque type
  • implement iree_status_t allocation and payload chaining

@stellaraccident SG? any thoughts from a bindings-perspective?

@benvanik benvanik added the enhancement ➕ New feature or request label Dec 26, 2019
@benvanik benvanik self-assigned this Dec 26, 2019
copybara-service bot pushed a commit that referenced this issue Jan 2, 2020
This makes it possible to change the type of iree_status_t in future changes.

Progress on issue #265.

PiperOrigin-RevId: 287847316
copybara-service bot pushed a commit that referenced this issue Jan 2, 2020
This makes it possible to change the type of iree_status_t with most code not caring about what its bits are.

Progress on issue #265.

PiperOrigin-RevId: 287847316
copybara-service bot pushed a commit that referenced this issue Jan 2, 2020
This makes it possible to change the type of iree_status_t with most code not caring about what its bits are.

Progress on issue #265.

PiperOrigin-RevId: 287847316
copybara-service bot pushed a commit that referenced this issue Jan 2, 2020
This makes it possible to change the type of iree_status_t with most code not caring about what its bits are.

Progress on issue #265.

PiperOrigin-RevId: 287847316
copybara-service bot pushed a commit that referenced this issue Jan 2, 2020
This makes it possible to change the type of iree_status_t with most code not caring about what its bits are.

Progress on issue #265.

PiperOrigin-RevId: 287847316
copybara-service bot pushed a commit that referenced this issue Jan 3, 2020
This makes it possible to change the type of iree_status_t with most code not caring about what its bits are.

Progress on issue #265.

PiperOrigin-RevId: 287847316
copybara-service bot pushed a commit that referenced this issue Jan 3, 2020
This makes it possible to change the type of iree_status_t with most code not caring about what its bits are.

Progress on issue #265.

PiperOrigin-RevId: 288026341
@benvanik
Copy link
Collaborator Author

benvanik commented Feb 3, 2020

See example workaround in #657 that should be supported.

@benvanik
Copy link
Collaborator Author

benvanik commented Feb 11, 2020

Spent some time code golfing, got something that seems usable as a foundation:
https://gcc.godbolt.org/z/kD5XwH

Based on a storage struct created when the non-status is created and then a linked list of payloads. The final handler of the status (likely whatever binding code is calling into the C API) is responsible for freeing the status memory in the case of a failure.

This lets us write things like this to append messages to status values:

iree_status_t foo(int val) {
  IREE_RETURN_IF_ERROR(DoSomething());
  IREE_RETURN_IF_ERROR(DoSomething());
  IREE_RETURN_IF_ERROR(DoSomething(), "A");
  IREE_RETURN_IF_ERROR(DoSomething(), "B%i", val);
  IREE_RETURN_IF_ERROR(DoSomething(), "C%i%i", val + 1, val + 2);
  return IREE_STATUS_OK;
}

I was able to make it so that all error handling code is moved out of the core path which will be great for I$ usage. Example of foo() above:

foo:
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     x19, [sp, 16]
        mov     w19, w0
        bl      DoSomething
        mov     x1, x0
        cbnz    x0, .L29
        bl      DoSomething
        mov     x1, x0
        cbnz    x0, .L29
        bl      DoSomething
        cbnz    x0, .L37
        bl      DoSomething
        cbnz    x0, .L38
        bl      DoSomething
        mov     x1, x0
        cbnz    x0, .L39
.L29:
        mov     x0, x1
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 32
        ret
; error handling here

So the cost in success cases is no worse than a bool. If you aren't annotating things, which is the common case, then simple checks will fold; for example above L29 is the function exit that is used for the first two non-annotating RETURN_IF_ERRORs as well as the function exit, so the only code size increases vs simple bool handling is when annotating.

Even then, the annotations are pretty good. I force no inlining so that the code generated is always just the trampoline to the creation function. For example, annotating an unformatted message:

        ldr     x19, [sp, 16]
        adrp    x1, .LC4
        ldp     x29, x30, [sp], 32
        add     x1, x1, :lo12:.LC4
        b       iree_status_annotate  ; tail call woo!

So, in ticking the boxes:

  • no more expensive than using bool
  • enables use of tail-call optimization
  • pay-for-what-you-use code size increases (annotations, formatting)
  • rich formatting
  • thread-safe/context-safe (no errno-style globals/thread_locals required)
  • stores source_location (file/line)
  • custom payload support for stack traces (host, vm, accelerator, etc) including custom formatters
  • absl::Status wrapping (as a payload)
  • MUST_USE_RESULT to avoid leaks
  • can trivially strip all annotations/allocations and turn status into a pure bool-like solution
  • (negative) manual cleanup required, but uncommon

The manual cleanup is annoying, but required for calling code written in C. My next step is to see if I can make a C++ wrapper that is easy to use. If I can, then I may switch iree::Status to the wrapper and use this C impl as the base everywhere. That way we can preserve payloads in non-string forms to allow the bindings on the other side of the C boundary to extract the payloads (and say, propagate errors in more native forms).

@stellaraccident
Copy link
Collaborator

Ohhh! Very nice!

@silvasean
Copy link
Contributor

Very nice! This is going to be awesome!

@antiagainst
Copy link
Contributor

C macros and compiler builtin functions FTW! :)

@benvanik benvanik added the runtime Relating to the IREE runtime library label Mar 19, 2020
@benvanik benvanik added this to the 2020Q2 Core milestone Mar 26, 2020
@benvanik benvanik modified the milestones: 2020Q2 Core, 2020Q3 Core Jun 3, 2020
benvanik added a commit that referenced this issue Aug 6, 2020
benvanik added a commit that referenced this issue Aug 6, 2020
benvanik added a commit that referenced this issue Aug 6, 2020
benvanik added a commit that referenced this issue Aug 7, 2020
benvanik added a commit that referenced this issue Aug 7, 2020
Progress on #265.

# Conflicts:
#	iree/vm/bytecode_dispatch.c
benvanik added a commit that referenced this issue Aug 11, 2020
benvanik added a commit that referenced this issue Aug 11, 2020
Progress on #265.

# Conflicts:
#	iree/vm/bytecode_dispatch.c
benvanik added a commit that referenced this issue Aug 12, 2020
benvanik added a commit that referenced this issue Aug 12, 2020
Progress on #265.

# Conflicts:
#	iree/vm/bytecode_dispatch.c
benvanik added a commit that referenced this issue Aug 17, 2020
ScottTodd added a commit that referenced this issue Aug 17, 2020
…oss the VM (#2849)

This fixes #265 by implementing the described iree_status_t behavior and changing Status/StatusBuilder/StatusOr to wrap it. By doing this I was able to remove quite a bit of gunk that was required to move between the Google-style C++ Status and the API (such as ToApiStatus/FromApiStatus).

By being able to integrate iree_status_t into the C++ types I was also able to unify all of the macros - so IREE_CHECK_OK and IREE_RETURN_IF_ERROR can take either iree_status_t or Status/etc. Because the behavior is now different we can no longer use the iree/base/google/ versions and as such I've renamed all macros such that there are no collisions with the non-IREE versions.
Now, IREE_CHECK_OK/IREE_RETURN_IF_ERROR when used in C code can take optional printf-style arguments to add annotations to results, for example: `IREE_RETURN_IF_ERROR(OtherFunc(...), "with a value: %d", 5);`
When used in C++ the macros still accept printf-style arguments but can also use the ostream style: `IREE_RETURN_IF_ERROR(OtherFunc(...)) << "with a value: " << 5;`. Though it's supported one shouldn't mix them!

The most interesting changes in this branch are in iree/base/api.h + iree/base/api.c and the iree/base/internal/status* files. Other changes are mechanical cleanups required to avoid name conflicts, enable C compilation of core parts (or code to be used both in C and C++), and removing abseil deps from files that are C compatible.

A lot of focus was put on keeping the status usage - even if source location, messages, and stack traces are attached - from increasing codesize or overhead when on the happy success path. https://gcc.godbolt.org/z/xGsPc3

For example, this chain of calls and checks:
![image](https://user-images.githubusercontent.com/75337/89965402-91f43480-dc01-11ea-8514-d307828f4910.png)

Ends up as the minimal possible code (while still having any kind of success checks):
![image](https://user-images.githubusercontent.com/75337/89965439-aafce580-dc01-11ea-9359-fade63cac4ba.png)

The error handling cases are moved out of the way and (hopefully) kept out of cache. The handlers are able to hit the tail call optimization happy path and keeps the whole function code size smaller:
![image](https://user-images.githubusercontent.com/75337/89965512-e697af80-dc01-11ea-9b2c-1c42aabbb788.png)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ➕ New feature or request runtime Relating to the IREE runtime library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants