Simulation errors are not propagated when using cse_execution_start() #415

eidekrist · 2019-10-09T12:33:04Z

Setup: Client code using csecorec, for example cse-server-go.

The client code calls cse_execution_start() and after a while a simulation error in a slave occurs. The simulation error is caught in algorithm::do_step() and rethrown, but not properly caught in cse.cpp. From the client's point of view, simulation time is no longer progressing, but execution->state = CSE_EXECUTION_RUNNING. Calling cse_execution_stop() results in an exception being caught, and execution->state = CSE_EXECUTION_ERROR. The error state should be set as soon as the simulation error occurs.

The text was updated successfully, but these errors were encountered:

…(win64 and linux64)

kyllingstad · 2019-10-12T12:15:39Z

The error state should be set as soon as the simulation error occurs.

I disagree. Consider this scenario:

I create an execution and call cse_execution_start() to start running the simulation in a background thread.
In the foreground thread, I create another execution and call cse_execution_start() to run that in a different background thread.
At some point, both simulations fail, and not necessarily for the same reason.

How do I know which execution has now set the error state?

Note that I don't even have to be running two executions. In step 2 I could call any potentially-failing function in cse.h and be equally confused. Nor do both executions in my example have to be started from the same foreground thread. Since what you suggest necessarily implies a global error state, I could be calling cse.h functions in two different "foreground" threads in two completely separate and unrelated parts of my program and not know where the error comes from.

Hence, I believe that the error state must be thread local.

This, again, means that the error state can only be set by a function called in the foreground thread, and it cannot change magically before the next call of any potentially-failing function in the same thread.

kyllingstad · 2019-10-12T12:24:51Z

I'd say that there are 2 main methods for handling errors in asynchronous function calls:

Poll: Provide a function that can query the cse_execution for its current state. Alternatively and equivalently, return a queriable handle from cse_execution_start(). In either case, it would be the query function that sets the thread-local error state, not the background thread.
Notify: Let client code supply a callback function to cse_execution_start() that is called whenever something happens, like an error. The callback would be called from the background thread, and it would be the client code's responsibility to ensure that it is properly synchronised. The foreground thread error state would not get modified in this case; all error information must be supplied to the callback.

kyllingstad · 2019-10-12T12:36:45Z

The simplest "poll" solution I can think of: We already have a boost::fibers::future<bool> object in cse_execution that represents the result of cse::execution::simulate_until(). We can call its wait_for() function with a timeout of 0 to determine in a non-blocking fashion whether the simulation has stopped, and if it has, call get() to transport the exception to the foreground thread where it can be handled in the usual manner.

eidekrist · 2019-10-14T09:24:24Z

Thanks for the feedback!

The error state I'm referring to is contained in cse_execution_s::state, so that is unique for each execution. The only global variables we have are g_lastErrorMessage and g_lastErrorCode, and for your first example finding out which execution created which error message is a challenge with the code as-is. I hadn't thought of any of your examples until now, but considering your multi-thread example I agree that these should be kept thread local.

I might have torn down too many fences with #422, the problem being setting last error code/message from a different thread than the foreground thread. Polling does indeed seem like a better solution, will give this a go.

kyllingstad · 2019-10-14T10:54:52Z

The error state I'm referring to is contained in cse_execution_s::state, so that is unique for each execution. The only global variables we have are g_lastErrorMessage and g_lastErrorCode,

Ok, I understand. I was referring to g_lastError* as the "error state" in my comments, in case that was unclear.

…) will throw if exception has been set. We do however need to call future.get() before joining the execution thread, and we need to join the execution thread.

…status. Keep throwing if an error already has occurred.

* #415 Propagate simulation errors from async simulation execution. * #415 Replace DPController.fmu (only win64) with handwritten fail.fmu (win64 and linux64) * #415 Take two with polling. * #415 Update docstring after review * #415 Get rid of execution->simulate_exception_ptr * #415 Don't need to check future.get_exception_ptr() since future.get() will throw if exception has been set. We do however need to call future.get() before joining the execution thread, and we need to join the execution thread. * #415 Perform running execution health check inside cse_execution_get_status. Keep throwing if an error already has occurred.

eidekrist added the bug Something isn't working label Oct 9, 2019

eidekrist mentioned this issue Oct 9, 2019

Print error messages to console log #412

Open

eidekrist pushed a commit that referenced this issue Oct 11, 2019

#415 Propagate simulation errors from async simulation execution.

b5eddee

eidekrist pushed a commit that referenced this issue Oct 11, 2019

#415 Replace DPController.fmu (only win64) with handwritten fail.fmu …

f9db2f7

…(win64 and linux64)

eidekrist mentioned this issue Oct 11, 2019

Propagation of simulation errors through C api #422

Merged

eidekrist pushed a commit that referenced this issue Oct 15, 2019

#415 Take two with polling.

11c12e5

eidekrist pushed a commit that referenced this issue Oct 15, 2019

#415 Update docstring after review

05699ba

eidekrist pushed a commit that referenced this issue Oct 15, 2019

#415 Get rid of execution->simulate_exception_ptr

da5c4fd

eidekrist pushed a commit that referenced this issue Oct 18, 2019

#415 Perform running execution health check inside cse_execution_get_…

2fb3445

…status. Keep throwing if an error already has occurred.

eidekrist closed this as completed in #422 Oct 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulation errors are not propagated when using cse_execution_start() #415

Simulation errors are not propagated when using cse_execution_start() #415

eidekrist commented Oct 9, 2019

kyllingstad commented Oct 12, 2019

kyllingstad commented Oct 12, 2019

kyllingstad commented Oct 12, 2019

eidekrist commented Oct 14, 2019

kyllingstad commented Oct 14, 2019

Simulation errors are not propagated when using cse_execution_start() #415

Simulation errors are not propagated when using cse_execution_start() #415

Comments

eidekrist commented Oct 9, 2019

kyllingstad commented Oct 12, 2019

kyllingstad commented Oct 12, 2019

kyllingstad commented Oct 12, 2019

eidekrist commented Oct 14, 2019

kyllingstad commented Oct 14, 2019