Stats actions should discard intermediate state on cancellation #82337
Labels
>bug
:Data Management/Stats
Statistics tracking and retrieval APIs
Team:Data Management
Meta label for data/management team
Most stats actions fan out to various nodes in the cluster and collect per-node responses which are then aggregated into the final result. The per-node responses may sometimes be many MBs in size. If the client cancels the request by closing its connection then we broadcast the cancellation to all the target nodes and wait for them to respond with a
TaskCancelledException
before discarding the intermediate results. It's possible for one of the target nodes to take many minutes to respond to the cancellation if, for instance, it is overwhelmed by GC activity. In that case we retain many MBs of unnecessary intermediate state for many minutes.We should instead react to the cancellation by immediately discarding the intermediate results and dropping any further results that arrive to free up this unnecessary memory usage. One possible way to do this would be to allow a
CancellableTask
to accumulate listeners which are completed byCancellableTask#onCancelled()
.Relates #55550 (comment) which contains a list of some of the more important cases of this to address.
The text was updated successfully, but these errors were encountered: