-
Notifications
You must be signed in to change notification settings - Fork 1
Proposal 4
Benjamin Allan edited this page Dec 3, 2019
·
8 revisions
See also issue #116 Lightweight job monitoring support (ljms) and simple user sampler https://github.com/ovis-hpc/ovis/issues/116
Some of the alternative methods to obtain application information are:
- Use caliper (LLNL) and pipe the data blobs via LDMS {if/when this combination is available}
- Use kokkos sampler from LDMS to push json data sets periodically.
- The shm sampler (aka MPI sampler) to poll shared memory binary data files written by the application.
- For progress detection, tail or filter a run-time user-specified log file.
- For configuration detection, try to automatically detect input files and copy them elsewhere.
- App directly to network database (sql, dsos, etc).
In many cases, even application developers are not in a position to enforce creation or location of log and configuration files. Many simulation control languages have include statements, making auto-discovery of configuration input impossible.
See #116 for initial list. Add extra here.
- Baseline:
- Canonical data location in /dev/shm/jobmon/$JOBID.config, $JOBID.progress, $JOBID.env
- May base directory may be overridden by admin or user supplied environment variable (or argument to scripted utilities).
- New Samplers
- TOML jobmon file sampler
- String-file-blob jobmon sampler with optional at-store decoding.
- C class library API and supporting wrappers for developers/users to construct or parse data files.
- Application defines event name/counter pairs for progress. Structured naming ala TOML.
- App defines scope/name/value tuples for configuration parameter capture.
- Library captures data types and includes them in text format somehow.
- Store that is smart enough to just roll with schema changes.
- Canonical data location in /dev/shm/jobmon/$JOBID.config, $JOBID.progress, $JOBID.env
The merits and demerits of the alternatives, preferably based on examples and (where needed) prototype implementations.
requirement | caliper | kokkos | shm | detect file progress | detect config files | net database | baseline |
---|---|---|---|---|---|---|---|
free of ldmsd connect | no | no | yes | yes | yes | yes | yes |
human readable | no | yes | no | yes | yes | no | yes |
cheap/no low parse | yes | no | yes | no | no | yes | yes |
bounded by API | no | yes | yes | maybe | maybe | yes | yes |
free of net FS | yes | yes | yes | maybe | no | no | yes |
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running