-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] invehicle-digital-twin sometimes becomes unresponsive after calling the register
API with the managed_subscribe
feature
#73
Labels
bug
Something isn't working
Comments
Here are trace-level logs from starting up Ibeji and submitting a single successful register call. Note the inclusion of 3 warning level logs from the wilyle/block_on branch, which are not present in the main branch. These messages were logged at warning level just to improve their visibility when reading through verbose logs to determine the system state and are not indicative of a problematic state:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Steps to reproduce:
managed_subscribe
feature enabledThis is sometimes also observed when the application starts up without external stimulus, though this is not well investigated and repro steps are not yet available.
Workarounds
What has been tried so far:
futures::executor::block_on
in a tokio async context here, but this is now considered unlikely based on work in the wilyle/block_on branch which replaces this with a tokio equivalent.tokio::time::timeout
does not appear to solve the issue either. It's suspected that something is hogging the tokio runtime and not yielding to the timeout task, making it unable to exit even when time has expired.Other observations:
When enabling the tokio console (which has been done in the linked wilyle/block_on branch), the following state is noted when Ibeji becomes unresponsive:
Of particular note is the first task, which will persist in the busy state indefinitely and never be polled again. This behavior is indicative of a task that's blocking the thread. It's unclear what is originating the call to
hyper::common::exec
, but based on trace logs from a successful request it's believed to be related to the service interceptor infrastructure. In a successful API call, there are two calls to this function: one at the very beginning of request processing, and another sometime after theGrpcInterceptorService
has completed its processing (though the exact timing is unknown). The full trace log will be added in a comment to prevent an already very verbose issue from getting longer (see #73 (comment)).As mentioned in the workarounds, enabling trace logging to try to get more information makes the issue very difficult to reproduce, and I've been unable to do so with trace logging enabled. This suggests some kind of race condition that's affected by I/O side effects. The highest log level that yields consistently reproducible behavior is debug, at least when logs go to stdout. It may be worth trying again with logs going somewhere else. Note however that when using the tokio console integration, trace logs appear to be redirected to the tokio console and will not appear elsewhere, so trace logs might not be viewable while using this tool.
Next steps:
The following are potential steps to consider for further investigation or resolution:
hyper::common::exec
calls are made. How is it related to the service interceptor infrastructure? What operations could potentially hang indefinitely?block_on
pattern to use some form of polling for completion or a one-shot channel for receiving the resultAcceptance criteria
The text was updated successfully, but these errors were encountered: