Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation errors are not propagated when using cse_execution_start() #415

Closed
eidekrist opened this issue Oct 9, 2019 · 5 comments · Fixed by #422
Closed

Simulation errors are not propagated when using cse_execution_start() #415

eidekrist opened this issue Oct 9, 2019 · 5 comments · Fixed by #422
Labels
bug Something isn't working

Comments

@eidekrist
Copy link
Member

Setup: Client code using csecorec, for example cse-server-go.

The client code calls cse_execution_start() and after a while a simulation error in a slave occurs. The simulation error is caught in algorithm::do_step() and rethrown, but not properly caught in cse.cpp. From the client's point of view, simulation time is no longer progressing, but execution->state = CSE_EXECUTION_RUNNING. Calling cse_execution_stop() results in an exception being caught, and execution->state = CSE_EXECUTION_ERROR. The error state should be set as soon as the simulation error occurs.

@kyllingstad
Copy link
Member

The error state should be set as soon as the simulation error occurs.

I disagree. Consider this scenario:

  1. I create an execution and call cse_execution_start() to start running the simulation in a background thread.
  2. In the foreground thread, I create another execution and call cse_execution_start() to run that in a different background thread.
  3. At some point, both simulations fail, and not necessarily for the same reason.

How do I know which execution has now set the error state?

Note that I don't even have to be running two executions. In step 2 I could call any potentially-failing function in cse.h and be equally confused. Nor do both executions in my example have to be started from the same foreground thread. Since what you suggest necessarily implies a global error state, I could be calling cse.h functions in two different "foreground" threads in two completely separate and unrelated parts of my program and not know where the error comes from.

Hence, I believe that the error state must be thread local.

This, again, means that the error state can only be set by a function called in the foreground thread, and it cannot change magically before the next call of any potentially-failing function in the same thread.

@kyllingstad
Copy link
Member

I'd say that there are 2 main methods for handling errors in asynchronous function calls:

  • Poll: Provide a function that can query the cse_execution for its current state. Alternatively and equivalently, return a queriable handle from cse_execution_start(). In either case, it would be the query function that sets the thread-local error state, not the background thread.
  • Notify: Let client code supply a callback function to cse_execution_start() that is called whenever something happens, like an error. The callback would be called from the background thread, and it would be the client code's responsibility to ensure that it is properly synchronised. The foreground thread error state would not get modified in this case; all error information must be supplied to the callback.

@kyllingstad
Copy link
Member

The simplest "poll" solution I can think of: We already have a boost::fibers::future<bool> object in cse_execution that represents the result of cse::execution::simulate_until(). We can call its wait_for() function with a timeout of 0 to determine in a non-blocking fashion whether the simulation has stopped, and if it has, call get() to transport the exception to the foreground thread where it can be handled in the usual manner.

@eidekrist
Copy link
Member Author

Thanks for the feedback!

The error state I'm referring to is contained in cse_execution_s::state, so that is unique for each execution. The only global variables we have are g_lastErrorMessage and g_lastErrorCode, and for your first example finding out which execution created which error message is a challenge with the code as-is. I hadn't thought of any of your examples until now, but considering your multi-thread example I agree that these should be kept thread local.

I might have torn down too many fences with #422, the problem being setting last error code/message from a different thread than the foreground thread. Polling does indeed seem like a better solution, will give this a go.

@kyllingstad
Copy link
Member

The error state I'm referring to is contained in cse_execution_s::state, so that is unique for each execution. The only global variables we have are g_lastErrorMessage and g_lastErrorCode,

Ok, I understand. I was referring to g_lastError* as the "error state" in my comments, in case that was unclear.

eidekrist pushed a commit that referenced this issue Oct 15, 2019
eidekrist pushed a commit that referenced this issue Oct 15, 2019
eidekrist pushed a commit that referenced this issue Oct 15, 2019
…) will throw if exception has been set. We do however need to call future.get() before joining the execution thread, and we need to join the execution thread.
eidekrist pushed a commit that referenced this issue Oct 18, 2019
…status. Keep throwing if an error already has occurred.
eidekrist added a commit that referenced this issue Oct 18, 2019
* #415 Propagate simulation errors from async simulation execution.

* #415 Replace DPController.fmu (only win64) with handwritten fail.fmu (win64 and linux64)

* #415 Take two with polling.

* #415 Update docstring after review

* #415 Get rid of execution->simulate_exception_ptr

* #415 Don't need to check future.get_exception_ptr() since future.get() will throw if exception has been set. We do however need to call future.get() before joining the execution thread, and we need to join the execution thread.

* #415 Perform running execution health check inside cse_execution_get_status. Keep throwing if an error already has occurred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants