Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error behavior when used as a library #1778

Open
paulromano opened this issue Mar 2, 2021 · 2 comments
Open

Error behavior when used as a library #1778

paulromano opened this issue Mar 2, 2021 · 2 comments

Comments

@paulromano
Copy link
Contributor

One of OpenMC's "bad" behaviors right now is that when it encounters an error, it most circumstances it will call std::exit (or MPI_Abort when using MPI). This is not a desirable behavior if OpenMC is used as a library (e.g., as part of a multiphysics application). It also sucks when you're using the openmc.lib Python bindings to the C/C++ API and an error takes down the whole executable rather than bubbling up as an exception. We should be following community guidelines with respect to error handling. For example, one of the recommendations in the xSDK community package policies (R3) is:

It is recommended that each package adopt and document a consistent system for propagating/returning error conditions/exceptions and provide an API for changing the behavior. For example, all routines may, by default, return an error code with the option of changing it to generate an abort on the error (for running in the debugger). No package should have hardwired calls to abort, exit, or MPI_Abort(). Also, no package should have hardwired print statements for error conditions. Each package should document which error codes/exceptions are recoverable, which may result in lost resources (for example, unfreed memory), and which indicate that the process may be in an undefined or totally broken state (for example, after a segmentation violation). It is the responsibility of the calling routine not to simply continue the computation when a “hard” error is returned; the calling routine will likely, by default, call an abort, but again that should be possible to override.

What we have is currently a bit of a mess:

  1. Some errors will result in std::exit or MPI_Abort
  2. Some errors result in C++ exceptions being thrown
  3. Some C API functions will return an error code (these are indeed converted to exceptions in openmc.lib Python bindings)
@gridley
Copy link
Contributor

gridley commented Mar 2, 2021

This is something I'd like to help take on because error handling on GPU code is currently impossible. My current thinking is that we should add some new global variable like error_state that is an enum of possible error codes (or maybe each thread should own a copy?). The python API functions could work as-is and return that code. Then if running main() from C++, you'd just return back to main() and add some function for printing helpful error messages before exiting or possibly returning that error code.

If we took this approach on GPU, it would be nice, because this error flag could be set by some thread on the device, then the appropriate handling be done on the CPU after the kernel is done.

@paulromano
Copy link
Contributor Author

I think that approach sounds pretty nice. One of the nice things about storing the state in a global as opposed to returning the value is that you don't need to pass the return value through nested function calls. Exceptions also solve that problem, but of course they are no-go for GPUs. Note that we already have a global openmc_err_msg variable that is used by openmc.lib to get the error message.

For simplicity, we might want to just have the error state be an int rather than an enum so that it is clearly visible through a C API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants