You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a retrospective describing lesson's learnt while implementing the transmission of Stimulus ID's within the Scheduler using Context Variables. Context Variables were introduced in Python 3.7. Quoting PEP-567:
This PEP proposes a new contextvars module and a set of new CPython C APIs to support context variables. This concept is similar to thread-local storage (TLS), but, unlike TLS, it also allows correctly keeping track of values per asynchronous task, e.g. asyncio.Task.
The potential of ContextVar's here is that there is less need to pass state around the Scheduler internals and thereby avoid invasive code changes. In particular, they seem to be a good fit for implementing stimulus_id transmission through the scheduler. The idea is that there would be a global STIMULUS_ID ContextVar set in strategic locations within the scheduler (i.e. RPC Handler functions). There would then be no need to pass stimuli as function arguments throughout the scheduler, leading to less invasive changes -- scheduler functions could just call STIMULUS_ID.get().
Stimulus ID's are a method of tracking which event generated a series of rpc calls within a dask/distributed cluster.
They form part of a larger distributed tracing effort within dask/distributed.
They are frequently generated by the client and passed to and fro between Scheduler and Workers via RPC Handlers.
In the context of Transition tracing for scheduler task transitions #5849, it is not important to distinguish between Client and Scheduler for an update_graph_hlg event -- stimuli are in practice generated on the Scheduler although in principle the event transmitting the graph occurred on the Client.
Issues
distributed scheduler handlers often call other handlers internally. update_graph_hlg calls update_graph, which in turn calls client_releases_keys. This implies that stimuli should not be overridden.
This can result in the following sort of coding style:
defclient_releases_keys(..., stimulus_id=None):
try:
token=NoneSTIMULUS_ID.get() # Perhaps the STIMULUS_ID ContextVar was set in update_graphexceptLookupError:
# Perhaps it's was passed by an external entity, but perhaps it wasn't?token=STIMULUS_ID.set(stimulus_idorf"client-releases-keys")
try:
# Actual client_releases_keys implementation, now heavily indented
...
finally:
# We should try to reset the STIMULUS_ID once we leave the function# because client_releases_keys is called frequently in our test suite and we# want other functions to be able to set the STIMULUS_IDiftoken:
STIMULUS_ID.reset(token)
As STIMULUS_ID is a global variable, it was important to reset the token due to the extensive use of rpc handler calls within the test suite. Without this, STIMULUS_ID would be permanently set on the first call to client_releases_keys, for example.
A variety of approaches were attempted to handle this complexity
Support stimulus_id's via Wrapped RPC handlers and ContextVars #6010. Here, RPCHandler classes wrapped Scheduler functions. STIMULUS_ID was set if it was provided as a kwarg during the RPC call, otherwise it was automatically created based on the function name. However, wrapped RPCHandler classes were deemed to be an excessive change at this point in time.
Stimulus id contextvars explicit handlers #6083 attempted to create a very clear boundary between RPC handler functions and internal scheduler functionality by creating for e.g. handle_update_graph and handle_update_graph_hlg handlers which merely called the update_graph and update_graph_hlg functions respectively. This removed much of the complexity about deciding if a previously called handler had set the stimulus_id, but required explicitly calling handlers and setting stimulus_id's in the test suite.
A common necessity for all these changes was the need to call self.loop.addcallback(copy_context().run, handler). This is necessary because of the possibility of crossing a thread boundary and the need to copy contexts. In particular, Tornado doesn't always handle ContextVars resets correctly in versions prior to 6.1.
Result
Given:
the complexity around the interaction of stimuli and handlers.
The uncertainty around Tornado ContextVar resets, and
This issue is a retrospective describing lesson's learnt while implementing the transmission of Stimulus ID's within the Scheduler using Context Variables. Context Variables were introduced in Python 3.7. Quoting PEP-567:
The potential of ContextVar's here is that there is less need to pass state around the Scheduler internals and thereby avoid invasive code changes. In particular, they seem to be a good fit for implementing stimulus_id transmission through the scheduler. The idea is that there would be a global
STIMULUS_ID
ContextVar set in strategic locations within the scheduler (i.e. RPC Handler functions). There would then be no need to pass stimuli as function arguments throughout the scheduler, leading to less invasive changes -- scheduler functions could just callSTIMULUS_ID.get()
.update_graph_hlg
event -- stimuli are in practice generated on the Scheduler although in principle the event transmitting the graph occurred on the Client.Issues
update_graph_hlg
callsupdate_graph
, which in turn callsclient_releases_keys
. This implies that stimuli should not be overridden.client_releases_keys
, for example.Approaches
default_stimulus_id
contextmanager for RPCHandlers, but this introduced too much indentation into the existing code-basehandle_update_graph
andhandle_update_graph_hlg
handlers which merely called theupdate_graph
andupdate_graph_hlg
functions respectively. This removed much of the complexity about deciding if a previously called handler had set the stimulus_id, but required explicitly calling handlers and setting stimulus_id's in the test suite.self.loop.addcallback(copy_context().run, handler)
. This is necessary because of the possibility of crossing a thread boundary and the need to copy contexts. In particular, Tornado doesn't always handle ContextVars resets correctly in versions prior to 6.1.Result
Given:
It was decided to reverted to a simpler, but more invasive argument passing approach in Support Stimulus ID's via argument passing #6095
The text was updated successfully, but these errors were encountered: