-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging standardization - Contextual logging - Structured logging #4762
Comments
I was made aware of https://eliot.readthedocs.io/en/stable/introduction.html and https://github.com/Delgan/loguru. Happy to hear about more libs and/or get feedback of the already mentioned. |
Configuring the logging is also challenging. And while our docs demonstrate how to log to disk, it's a little buried: xref: #3669 |
cc @itamarst |
While this is prone to change based on this discussion, would it be worthwhile giving more information on log config options in the configuration reference? Not sure how heavily trafficked that page is but I recall going there looking for log config options (such as the ability to output to file) and assuming they didn't exist because I didn't see any options listed there. |
So structlog is structured logging, so a lot better than just strings of text. The problem is that's all it is: messages at particular points in time ( Eliot is fundamentally different: it gives you causality, and a notion of actions that start and end. The output is a tree of actions (or really a forest of actions).
See https://pythonspeed.com/articles/logging-for-scientific-computing/ — I gave Dask variant of this talk at summit earlier this year, not sure if video is available. Eliot is one way to do this. It has Dask Distributed support built-in, for users of Distributed: https://eliot.readthedocs.io/en/stable/scientific-computing.html Another alternative, which is attractive in that there is a bunch of existing tooling for it because a bunch of SaaS platforms and tracing software systems support, is OpenTelemetry. Bigger picture perspective: if Dask Distributed has a good logging tracing/logging setup, and users are encouraged to use the same framework, users get to see logs that connect not just their logic but also how the distributed system is scheduling everything. Which is probably useful for performance optimization. |
Logging is often a crucial instrument for debugging and we are using different ways to do so.
logging
for human readable messages without contextual informationdeque
with some context information, e.g.Scheduler.log_event
/Scheduler.events
Logs external stimuli from workers and clients as an event in a dictionary by sourceScheduler.transition_log
Exclusively used to log transitions in a semi-structured format(key, start, finish, recommendations, timestamp)
Worker.log
Unstructured. Part events, part transitions, sometimes with timestampsThe problems I see with this approach are multifold
deque
logging has been frequently the cause for memory related troubles since they accumulate memory over time and users are often not aware of this. We artificially need to limit the amount of logs to keep with options liketransition-log-length
,events-log-length
,events-cleanup-delay
, etc.story
but we need to write specialized functions for every possible querydistributed/distributed/worker.py
Lines 1946 to 1958 in b577ece
Most, if not all, of the above described issues can be addressed by custom solutions.
For instance
Instead of doing this all by ourselves, we could also resort to libs which are doing a great job of encapsulating this in easy to use APIs. One lib I am decently familiar with and is quite popular is structlog and I was wondering if this was something we are interested in.
The text was updated successfully, but these errors were encountered: