dataman: Add client sync perf counter and increase default timeout to 5s #22845
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Solved Problem
Dataman requests occasionally time out during boot. My suspicion is that its a combination of slow SDCard and unfortunate scheduling order during boot when the CPU is very busy. The timeout of 1s may be reached more quickly than expected.
However, it is notoriously difficult to reproduce this issue under controlled and debuggable/traceable conditions. I was only able to catch it once and the
dataman: write
perf counter "only" recorded most=310ms. However, on unrelated traces I've seen scheduling latencies on the dataman thread in the hundreds of milliseconds, here 271ms:Solution
This increases the dataman client timeout to 5s and logs the time it took to service the sync requests.
There are already two perf counters
dataman: write
anddataman: read
, however, those are running in the dataman thread, while the client is running on another thread. Thus adding this perf counter will show scheduling latencies between the dataman and its client threads.Changelog Entry
For release notes:
Alternatives
Dataman request timeouts are bad anyways, since the data is needed to fulfill the mission. There really isn't any alternative to not getting any data.
Context
See also #22778.