-
Notifications
You must be signed in to change notification settings - Fork 1
Proposal 5
Logging in LDMSD has scaling and other developer-visible issues that have been identified. This proposal aims to produce an API and recommended use pattern to alleviate the issues. Prior discussion of logging changes took place as part of https://github.com/ovis-hpc/ovis/pull/448 . Ben Allan presented some slides at the 10/26/2020 LDMS User's Group teleconference, which have been extended into this page to gather additional community input (see also: https://github.com/ovis-hpc/ovis/wiki/resources/UGTelecons/LDMS-log.pdf).
- Desire to divert specific messages to an independent log file
- Desire to format some messages not as log messages but as configuration sequence replay
- Logs can get very large (GB/minute), dependent on level.
- Lack of filtering, except by global threshold, leading to incorrectly high or low priority settings in code.
- Lack of scope-based filtering.
- Too many (?) destination options: stdout, log file, syslog.
- Filtering by a threshold, rather than a bit list of priorities of interest.
- Old implementations in source tree unused (lib/ olog).
- Both calls to a function ldmsd_log() in main() (instead of to a library) and calls to printf-style (ldmsd_log_warning()) or syslog(3) style function pointers passed through structures or function signatures are currently used to get call access to the logging implementation.
- The LDMS daemon should allow the code developers to group related log messages by facility identifier (a string name and 1:1 run-time generated opaque look-up tag) so that users can specify output filtering specific to that facility.
- The LDMS daemon should be user-configurable to emit messages at only the priority levels and facilities of interest to the user.
- Determining whether a message shall be logged or not at the time the log() operation is called must be O(1) in time cost.
- Memory required to manage the log filtering shall be O(N) in memory, where N is the number of unique facility names.
- The API shall be thread-safe.
- The API shall be compatible with third party plugins defining their own named facilities that are not known at compilation time of the ldmsd program.
- The API implementation shall be scalable to a large number of facilities, so as to allow defining per-instance facilities and not just per-class or per-file facilities.
- Log messages shall (at least optionally) be able to be formatted in O(1) time with both the string formatted priority and facility string prepended to the message.
- Log messages shall go to the default log destination (as controlled from the command line or option specification).
- Log messages for a specific facility or facility match expression shall go to an alternate user-defined file.
The following are dependent on the developer of a specific facility, not the logging library API. Failure to follow these guidelines will result in filters being difficult to configure using partial names and wildcards.
- Facility names should be reproducible from one execution to the next, to facilitate reusable configuration files and debugging.
- Facility names should be dot-hierarchical, with more specific scopes coming last in the name, e.g. a specific sampler instance might be sampler.$Plugin.$instance or sampler.meminfo.node25. This enables wildcard matching of facility names with a similar scope, e.g. filter: "sampler.%=debug".
- Facility names should not contain square brackets [] or commas, so that HPC node list formats can be used for defining filters. E.g. sampler.stream.node[1-5,7]
(1) New library API 'ldms_log' and helpers with a developer-supplied facility name string and an integer facility tag.
In the following TYPE is a integer type or opaque pointer to be determined.
// log a message from facility at a priority. 'facility' comes from facility_get call.
int ldms_log(TYPE facility, enum ldms_log_level priority, const char *msg_format, ...);
// define a new facility name and get its tag. If name is already taken, returns -1 or NULL (determined by TYPE implementation).
TYPE ldms_log_facility_get(const char *facility_name);
// return true if messages at level priority for the facility should be logged.
bool ldms_log_filter(TYPE facility, enum ldms_log_level priority);
// retire a facility.
void ldms_log_facility_put(TYPE facility);
- No need to reinvent the wheel: See and improve (the configuration bits): https://github.com/HardySimpson/zlog (http://hardysimpson.github.io/zlog/UsersGuide-EN.html)
The merits and demerits of the alternatives above, preferably based on examples and (where needed) prototype implementations.
(1) has the following merits and demerits observed.
- The author of a plugin or other context can create multiple facilities (e.g. per-instance).
- The author of a plugin has no obvious way to handle conflicts with pre-existing facility names. This could occur if they have chosen (perhaps unwittingly) to violate a convention on naming their facility. This suggests that the framework loading the plug-in or creating the instance should always handle creating the facility name and tag object, ensuring consistency of naming practice. E.g. sampler.$plugin.$instance where instance is related to the humane indexing of the object, not the string formatting of a pointer value.
- Can be made back-compatible with existing code, allowing a gradual migration.
- The implementation of the facility abstraction is exposed as an int64_t or pointer to a small object for fast O(1) filtering.
- This could be an abstract pointer, perhaps to something like struct ldms_log_facility { int64_t index; char *name; uint_8 priority_filter_bits; };
- Any version (int, opaque pointer) of this still ends up required to be handled by the ldms_log caller. An opaque version allows us to later expand the options for handling messages on specific facilities, perhaps sending them to different logging plugins.
The agreed solution, implementation team, review/test team, go here.
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running